吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 3929|回复: 6
收起左侧

[Python 原创] 【原创源码】【Python】使用Python爬取爱盘工具

  [复制链接]
MikoSec 发表于 2019-10-20 18:41
刚装的Windows准备学逆向,装点工具。由于本人比较懒,不想一个一个工具去找然后再下载,虽然可以装吾爱的虚拟机,但是操作起来没物理机舒服(然而窝还是装了)。
正好爱盘这里有不少工具。所以enmmmm
Python Start
Wirte Code
Start Python Code.
Get Tools.

因为爱盘几个大字写着不允许多线程下载(如果我理解无误的话)。就没有写多线程enmmm
爱盘限制多线程下载访问,请使用单线程进行下载访问,多并发会被禁止访问。

需要安装的库文件:
requests -- HTTP请求库
bs4 -- 解析库

下面是源码qwq(码农玩家玩耍)

[Python] 纯文本查看 复制代码
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#-*- coding: utf-8 -*-
import os
import time
 
import requests
 
from bs4 import BeautifulSoup
 
def download(downurl, path, filename): # 下载函数
    start = time.time() # 开始时间
     
    if not os.path.exists(path):
        os.makedirs(path)
    if path[-1] != os.sep:
        path += os.sep
    file = path+filename
    size = 0
     
    response = requests.get(downurl, stream=True)
    if response.status_code != 200:
        print(f"[Erroe] url => {url}\tstatus_code => {response.status_code}")
        return
       
    chunk_size = 1024
    content_size = int(response.headers["content-length"])
     
    print("[File Size]: %0.2f MB" % (content_size / chunk_size / 1024))
    with open(file, "wb") as f:
        for data in response.iter_content(chunk_size):
            f.write(data)
            size += len(data)
            print("\r[Downloading]: %s>%.2f%%" % ("="*int(size*50/content_size), float(size/content_size*100)), end="")
     
    end = time.time() # 结束时间
    print("Using Time: %.2fs"%(end-start))
 
def main():
    url = "https://down.52pojie.cn/Tools/" # 爱盘Tools URL
    if not os.path.exists("Tools"):
        os.mkdir("Tools")
    os.chdir("Tools")
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77me/77.0.3865.120 Safari/537.36"}
    req = requests.get(url, headers=headers)
    soup = BeautifulSoup(req.text,'lxml')
    for i in soup.find_all("td", class_="link")[1:]: # 获取目录
        tooldir = i.text
        dir_url = url+tooldir
        print(dir_url) # 目录URL
        req = requests.get(dir_url)
        req.encoding = "utf-8"
        soup1 = BeautifulSoup(req.text,'lxml')
        for j in soup1.find_all("td", class_="link")[1:]: # 获取文件
            path = tooldir
            filename = j.text
            downurl = dir_url+filename
            print(f"[Downloading] Path => {path}\tFileName => {filename}")
            download(downurl, path, filename)
 
main()


然后是打包后的exe(普通玩家玩耍)

链接:https://pan.baidu.com/s/11xc6ENUELIWaQNIJsbxojA
提取码:d3ig

百度网盘分享链接在线解析
https://www.baiduwp.com/

免费评分

参与人数 3吾爱币 +4 热心值 +2 收起 理由
苏紫方璇 + 2 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
UniqueLegend + 1 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!
chomosuke + 1 用心讨论,共获提升!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

www.52pojie.cn 发表于 2019-10-21 11:06
来52破52这是扛把子精神
ixsec 发表于 2019-10-20 19:29
chomosuke 发表于 2019-10-20 21:32
头像被屏蔽
coradong1985 发表于 2019-10-20 23:17
提示: 作者被禁止或删除 内容自动屏蔽
chomosuke 发表于 2019-10-25 18:15
本帖最后由 chomosuke 于 2019-10-28 20:34 编辑

大概改良了一下, 代码:

https://github.com/Maemo8086/Python_AiPan_Crawler

[Python] 纯文本查看 复制代码
001
002
003
004
005
006
007
008
009
010
011
012
013
014
015
016
017
018
019
020
021
022
023
024
025
026
027
028
029
030
031
032
033
034
035
036
037
038
039
040
041
042
043
044
045
046
047
048
049
050
051
052
053
054
055
056
057
058
059
060
061
062
063
064
065
066
067
068
069
070
071
072
073
074
075
076
077
078
079
080
081
082
083
084
085
086
087
088
089
090
091
092
093
094
095
096
097
098
099
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
#!/usr/bin/python
# -*- coding: utf-8 -*-
 
'''
MIT License
 
Copyright (c) 2019 Maemo8086
Copyright (c) 2019 MikoSec
 
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
 
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
 
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
'''
 
import os
import random
import requests
import sys
import time
from bs4 import BeautifulSoup
from collections import deque
 
 
def download(file_url, path, filename):  # 下载函数
    global headers
    global download_exception
 
    display_max = 64
 
    if path:
        if len(path) > display_max:
            display_path = '...' + path[-display_max:]
        else:
            display_path = path
    else:
        display_path = '/'
 
    if len(filename) > display_max:
        display_filename = '...' + filename[-display_max:]
    else:
        display_filename = filename
 
    print()
    print(f'\r[Downloading] Path => {display_path}\tFile Name => {display_filename}')
    sys.stdout.flush()
 
    delay = False
    if delay:
        wait = round(random.uniform(0, 5), 2)
        print(f'\r[Info] Waiting {wait} seconds...')
        sys.stdout.flush()
        time.sleep(wait)
 
    start = time.time()  # 开始时间
 
    if path:
        if not os.path.exists(path):
            os.makedirs(path)
        if path[-1] != os.sep:
            path += os.sep
    full_path = path + filename
 
    try:
        response = requests.get(file_url, headers=headers, stream=True, timeout=30)
    except:
        download_exception.append((file_url, path, filename))
        print(f'\r[Error] Download request for *{display_filename}* has failed.')
        return
 
    if response.status_code != 200:
        response.close()
        download_exception.append((file_url, path, filename))
        print(f'\r[Error] Download request for *{display_filename}* has failed.\tstatus_code => {response.status_code}')
        return
 
    try:
        content_size = int(response.headers['content-length'])
    except:
        response.close()
        download_exception.append((file_url, path, filename))
        print(f'\r[Error] Download request for *{display_filename}* has failed.\tMissing or invalid content-length.')
        return
 
    if content_size < 0:
        response.close()
        download_exception.append((file_url, path, filename))
        print(f'\r[Error] Download request for *{display_filename}* has failed.\tInvalid content-length range.')
        return
 
    print('[File Size] %0.2f MB' % (content_size / 1024 ** 2))
    sys.stdout.flush()
 
    if os.path.exists(full_path):
        if os.path.getsize(full_path) == content_size:  # 判断文件大小
            response.close()
            print('[Info] Same sized file exists, skipping...')
            return
        else:
            print('[Warning] Overwriting existing copy.')
 
    chunk_size = 1024
    size = 0
    try:
        with open(full_path, 'wb', buffering=1) as f:
            for data in response.iter_content(chunk_size):
                f.write(data)
                size += len(data)
                print(
                    '\r[Downloading] %s>%.2f%%' % (
                        '=' * int(size * 50 / content_size), float(size / content_size * 100)), end='')
    except:
        download_exception.append((file_url, path, filename))
        if os.path.exists(full_path):
            os.remove(full_path)
        print(f'\r[Error] Download *{display_filename}* has failed.')
        return
    finally:
        response.close()
        end = time.time()  # 结束时间
        print('\rTime elapsed: %.2fs' % (end - start))
 
 
def recursive_fetch(soup, part_url):
    global url
    global headers
 
    for i in soup.find_all('td', class_='link'):  # 获取文件或目录
        if i.text == 'Parent directory/':
            continue
 
        if i.text[-1] != '/':
            path = part_url[len(url):]
            filename = i.text
            file_url = part_url + filename
            download(file_url, path, filename)
        else:
            dir_url = part_url + i.text
            print()
            print(f'\r[Info] Searching under {dir_url}')
 
            execute = True
            while execute:
                wait = round(random.uniform(0, 5), 2)
                print(f'\r[Info] Waiting {wait} seconds...')
                sys.stdout.flush()
                time.sleep(wait)
 
                execute = False
                try:
                    with requests.get(dir_url, headers=headers, timeout=30) as req:
                        req.encoding = req.apparent_encoding
                        soup1 = BeautifulSoup(req.text, 'lxml')
                except:
                    execute = True
                    print(f'\r[Error] URL request *{dir_url}* has failed, retrying...')
 
            recursive_fetch(soup1, dir_url)
 
 
def main():
    global url
    global headers
    global download_exception
 
    print(
        '''
        Python 爱盘爬虫工具
         
        作者: Maemo8086,MikoSec
        源码: https://github.com/Maemo8086/Python_AiPan_Crawler
         
        一款基于Python的吾爱破解论坛爱盘下载工具
        本工具使用requests库跟bs4库
        建议使用前先修改User Agent
        '''
    )
 
    directory = 'AiPan'
    if not os.path.exists(directory):
        os.mkdir(directory)
    os.chdir(directory)
 
    try:
        with requests.get(url, headers=headers, timeout=30) as req:
            req.encoding = req.apparent_encoding
            soup = BeautifulSoup(req.text, 'lxml')
    except:
        print(f'\r[Error] URL request *{url}* has failed.')
        return
 
    recursive_fetch(soup, url)
 
    while download_exception:
        print()
        print(f'\r[Info] Retrying {len(download_exception)} failed downloads...')
 
        wait = round(random.uniform(10, 30), 2)
        print(f'\r[Info] Waiting {wait} seconds...')
        sys.stdout.flush()
        time.sleep(wait)
 
        download_exception_copy = download_exception.copy()
        download_exception.clear()
        while download_exception_copy:
            file_url, path, filename = download_exception_copy.pop()
            file_url = file_url.strip('\\')
            path = path.strip('\\')
            filename = filename.strip('\\')
            download(file_url, path, filename)
 
 
url = 'https://down.52pojie.cn/'  # 爱盘 URL
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77me/77.0.3865.120 Safari/537.36'}
 
download_exception = deque()
main()


小弟多年没写Python,请各大佬指点一下

18603867890 发表于 2020-8-1 16:29
然后是打包后的exe(普通玩家玩耍)

链接:https://pan.baidu.com/s/11xc6ENUELIWaQNIJsbxojA
提取码:d3ig







这个链接失效了
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2025-9-19 05:13

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表