快速获取链家房屋信息

gxw993 · 发表于 2024-3-26 10:54

利用Python 快速获取链家房屋信息，需要改成自己需要的URL

代码如下：
import traceback
import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {
"User - Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 119.0.0.0Safari / 537.36"
}

def parse_html(url):
movie_data_list = []
# 爬取的url,默认爬取的深圳的链家房产信息
# 请求url
resp = requests.get(url, headers=headers, timeout=10)

soup = BeautifulSoup(resp.content, 'lxml')
# 筛选全部的li标签
sellListContent = soup.select('.sellListContent li.LOGCLICKDATA')
# 循环遍历
for sell in sellListContent:
      # 标题
      title = sell.select('div.title a')[0].string
      # 先抓取全部的div信息，再针对每一条进行提取
      houseInfo = list(sell.select('div.houseInfo')[0].stripped_strings)
      # 楼盘名字
      loupan = houseInfo[0]
      # 对楼盘的信息进行分割
      # 房子类型
      info = houseInfo[0].split('|')
      info1 = info[0].strip()
      # 面积大小
      house_type = info[1].strip()
      # 房间朝向
      area = info[2].strip()
      # 装修类型
      toward = info[3].strip()
      # 楼层
      renovation = info[4].strip()
      # 房屋地址
      positionInfo = ''.join(list(sell.select('div.positionInfo')[0].stripped_strings))
      # 房屋总价
      totalPrice = ''.join(list(sell.select('div.totalPrice')[0].stripped_strings))
      # 房屋单价
      unitPrice = list(sell.select('div.unitPrice')[0].stripped_strings)[0]
      followInfo = ''.join(list(sell.select('div.followInfo')[0].stripped_strings))

      movie_data_list.append({
         "房屋地址": positionInfo,
         "房子类型": info1,
         "面积大小": house_type,
         "房间朝向": area,
         "装修类型": toward,
         "楼层": renovation,
         "房屋总价": totalPrice,
         "房屋单价": unitPrice,
         "关注发布": followInfo
      })
return movie_data_list;

def export_excel(datas):
"""
      导出数据到Excel
:param datas: 数据
:return:
"""
df = pd.DataFrame(datas)
df.to_excel("链家龙华3室2手房.xlsx", index=False)

datas = []  # 所有电影数据
for i in range(1, 2):  # 遍历10页
url = 'https://sz.lianjia.com/ershoufang/longhuaqu/pg{}l3/'.format(i)
# print(url)
movie_data_list = parse_html(url)
print(movie_data_list)
datas += movie_data_list

export_excel(datas)

爱飞的猫 · 发表于 2024-3-27 09:57

推荐使用代码框来展示代码哦！

语言可以选择 Python。

参考：

请求头的 User - Agent 应为 User-Agent。

datas = []  # 所有电影数据
for i in range(1, 2):  # 遍历10页

data 是复数形式，不需要加 s；下方 range 的参数应该是调试的时候改成 2 了，和注释不一致。

millioxe · 发表于 2024-3-27 11:19

这是准备买房迎接白富美吗，牛逼

lao6 · 发表于 2024-3-28 11:39

试试租房数据

dingbin99 · 发表于 2024-3-29 08:35

谢谢楼主分享

shdanney · 发表于 2024-3-29 20:03

感谢楼主分享

Respect2900 · 发表于 2024-4-2 13:41

感谢楼主分享，原来py可以做这么多事情

帐号		自动登录	找回密码
密码			注册[Register]

[Python 原创] 快速获取链家房屋信息

个人中心