本帖最后由 panpanpan 于 2021-8-15 15:34 编辑
2021-06-09 更新,拳新版本
原帖子地址:https://www.52pojie.cn/thread-1394757-1-1.html
@culprit
与修改版不同,该版本将所有文章链接都取到本地,然后枚举下载,与修改版各有好处,个人觉得好用因此分享给大家
唯美图库的图片质量确实很高,画质拿来当壁纸都没有问题的
[Python] 纯文本查看 复制代码 from bs4 import BeautifulSoup
import requests,re,os
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'referer' : 'https://www.vmgirls.com/'
}
def loadDatas(datas):
for data in datas:
url = "https://www.vmgirls.com/" + data
print(url)
print('-----------------------------------------------')
Down_Image(url)
print('-----------------------------------------------')
def Down_Image(url):
response = requests.get(url, headers=headers).text
soup = BeautifulSoup(response, 'html.parser')
image_url = soup.find_all('img')
for data in image_url:
image_type = data.get('src').split('.')[-1]
if image_type == 'jpg' or image_type == 'jpeg' or image_type == 'png':
url_data = data.get('src')
# print(url_data)
dir_name = soup.find(class_='post-title h1').string
if not os.path.exists(dir_name):
os.mkdir(dir_name)
# print(dir_name)
# 解决报错问题
str_url_data = str(url_data)
if not re.match(r'^http', str_url_data):
str_url_data = "https:" + str_url_data
image = requests.get(str_url_data, headers=headers).content
file_name = url_data.split('/')[-1]
# print(file_name)
with open(dir_name + '/' + file_name, 'wb') as f:
print('正在写入----->' + dir_name + '/' + file_name)
f.write(image)
if __name__ == '__main__':
print(' ---------------------------------------------------------------------')
print('| |')
print('| Author:culprit --- 52pojie |')
print('| Modified by panpanpan(1277936431) --- 52pojie |')
print('| |')
print(' ---------------------------------------------------------------------')
with open(r'datas.txt') as f:
content = f.read()
datas = content.split('\n')
input('点击开始!') loadDatas(datas)
*** 有需则取,无需者请勿恶意占用站点资源! ***
截至6月9号新增的链接,加入或者覆盖再运行就可以了
[Python] 纯文本查看 复制代码 16615.html
16490.html
16500.html
16483.html
16470.html
16442.html
16431.html
16137.html
16398.html
16604.html
16454.html
16585.html
16377.html
16364.html
16356.html
16345.html
16560.html
16331.html
16544.html
16316.html
16305.html
16292.html
16284.html
16388.html
16264.html
16255.html
16238.html
16224.html
16216.html
16209.html
16196.html
16181.html
16115.html
16144.html
16121.html
16101.html
16092.html
16076.html
16065.html
16053.html
16533.html
16015.html
16008.html
16001.html
15990.html
15976.html
15969.html
15959.html
15952.html
15945.html
16274.html
15938.html
15931.html
15984.html
15925.html
15918.html
15911.html
2021-06-09:
这段时间都没时间上论坛,发现代码运行爬下来的都是空白字符,然后重新优化了下代码,现在基本上没有问题了。
觉得好用给个评分支持,我寻思怎么收藏比评分还多
数据+代码链接:链接:https://pan.baidu.com/s/1yGCpMFIi1yDuVs8_7bjgbQ
提取码:52pj
|