百度文库爬虫批量下载

hksnow · 发表于 2019-8-15 13:12

本帖最后由 hksnow 于 2019-8-15 13:13 编辑

前言：

需要对照练习册答案，百度找到了答案，想要下载下来。
代码：
[Python] 纯文本查看 复制代码
import requests import json import os #from concurrent import futures headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'} # https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=fa3ab7c30c22590102029d3f&sign=meow def download_file(file_url,download_path): global file_num html = requests.get(file_url,headers = headers) file_name = download_path + '\\' + str(file_num) + '.jpg' with open(file_name,'wb') as code: code.write(html.content) file_num = file_num + 1 def start(link): global thread # 第一次提交为了获取基本信息 answer_id = link.split('/')[-1] url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=1&rn=5&answer_id=' + answer_id + '&sign=meow' html = requests.get(url,headers = headers) json_data = json.loads(html.text) img_totals = json_data['data']['answer_info']['pages'] title = json_data['data']['answer_info']['title'] page_nums = int(img_totals)//5 last_page_img_totals = int(img_totals)%5 # 第二次获取图片数据 path = os.getcwd() + '\\' + title os.makedirs(path) for n in range(0,page_nums + 1): url = 'https://wk.baidu.com/bigque/jsonp/naapi/answer/getwapflowanswerview?na_uncheck=1&pn=' + str(n) + '&rn=5&answer_id=' + answer_id + '&sign=meow' html = requests.get(url,headers = headers) json_data = json.loads(html.text) answer_urls_list = json_data['data']['answer_urls'] #print(answer_urls_list) if (n == page_nums): num_lists = range(0,last_page_img_totals) print('最后一页了！') else: num_lists = range(0,5) for x in num_lists: img_url = answer_urls_list[x] #print(img_url) #thread.submit() download_file(img_url,path) if __name__ == "__main__": #thread = futures.ThreadPoolExecutor(max_workers = 5) file_num = 1 url = '[/color][/color][/b][/size][size=6][b][color=Red][color=Black]' start(url)

仅支持https://wk.baidu.com/bigque/book/xxxxxxxxxx22590102029d3f这样类似的链接

KobeBryantmentu · 发表于 2019-8-15 13:24

试试看，源码好评

www1678 · 发表于 2019-8-15 13:25

有安卓的吗

薄荷叶1996 · 发表于 2019-8-15 13:34

牛批大佬

cdwdz · 发表于 2019-8-15 13:38

感谢分享谢谢

淮左名都 · 发表于 2019-8-15 13:48

mark，学习一下。。。

virgo915 · 发表于 2019-8-15 13:53

好东西，支持支持

fudashuai · 发表于 2019-8-15 13:57

欢迎为大家分享啊！

wty1641 · 发表于 2019-8-15 14:10

mark..........

s98 · 发表于 2019-8-15 14:12

厉害了大佬

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] 百度文库爬虫批量下载

免费评分

个人中心