好友
阅读权限10
听众
最后登录1970-1-1
|
本帖最后由 zheshen 于 2018-8-25 13:07 编辑
本文采取了正则表达式正在学习请勿喷谢谢import requests
import re
import os
from urllib import request
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
}
def parse_page(url):
res = requests.get(url,headers=HEADERS)
text = res.content.decode('utf-8')
print(text)
photo_url = re.findall(r'.*?"thumbURL":"(.*?)"',text,re.DOTALL)
# print(photo_url)
title_url_1 = re.findall(r'.*?"fromPageTitle":"(.*?)<.*?>(.*?)<.*?>(.*?)"',text,re.DOTALL)
for x in range(len(photo_url)-1):
photo_url_1 = photo_url[x]
t = title_url_1[x][0] + title_url_1[x][1] + title_url_1[x][2]
re.sub('[\??\.!。\-\+]','',t)
jpg = os.path.splitext(photo_url_1)[1]
name = t+jpg
request.urlretrieve(photo_url_1, 'imgs/' + name)
print('%s打印完成'%name)
print('打印完成')
def main():
url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word=%E5%B0%8F%E5%A7%90%E5%A7%90&pn=0&gsm=50&ct=&ic=0&lm=-1&width=0&height=0'
parse_page(url)
if __name__=='__main__':
main()
请大家点下爱心 谢谢 |
免费评分
-
查看全部评分
|
发帖前要善用【论坛搜索】功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。 |
|
|
|
|