BBC软件(com.iyuba.bbcstub)六分钟英语爬虫v2.0

三滑稽甲苯 · 发表于 2020-8-14 23:17

本帖最后由三滑稽甲苯于 2020-9-30 20:04 编辑

BBC软件截图⬇️

软件截图

脚本截图⬇️

脚本截图

代码：

[Python] 纯文本查看 复制代码

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

from os import listdir, remove, mkdir
from requests import get
from time import sleep
 
class Category():
    def __init__(self, n):
        self.num = n
        self.title = None
        self.sound = None
        self.pic = None
        self.readcount = None
        self.id = None
    def show(self):
        print(f'#{self.num}/{self.readcount}/{self.title}/{self.title_cn}')
        with open(f'Homepage/#{self.num}-{self.title}.jpg', 'wb') as f:
            img = get(self.pic).content
            f.write(img)
    def download(self):
        with open(f'Download/{self.title}.mp3', 'wb') as f:
            mp3 = get('http://static.iyuba.cn/sounds/minutes/' + self.sound).content
            f.write(mp3)
 
url = 'http://apps.iyuba.cn/minutes/titleNewApi.jsp?maxid={}&pages=1&pageNum=20&parentID=1&type=android&format=json'# {}内为当前所得最小的id值，欲最新则为0
r = get(url.format(0))
dic = r.json()['data']
banned = {'/', '\\', ':', '?', '*', '"', '<', '>', '|'}
for name in {'Homepage', 'Download'}:
    try: mkdir(name)
    except: pass
 
target = []
i = 0
print('#/Read Count/Title_EN/Title_CN  (Picture in Homepage/)')
for item in dic:
    epi = Category(i)
    title = item['Title']
    for b in banned: title = title.replace(b, '')
    epi.title = title
    epi.title_cn = item['Title_cn']
    epi.sound = item['Sound']
    epi.pic = item['Pic']
    epi.readcount = item['ReadCount']
    epi.id = item['BbcId']
    epi.show()
    target.append(epi)
    i += 1
print('Input number to get one, and "next" to get a next page.')
while True:
    t = input('I want #')
    if t == 'next':
        r = get(url.format(target[len(target)-1].id))
        dic = r.json()['data']
        for item in dic:
            epi = Category(i)
            title = item['Title']
            for b in banned: title = title.replace(b, '')
            epi.title = title
            epi.title_cn = item['Title_cn']
            epi.sound = item['Sound']
            epi.pic = item['Pic']
            epi.readcount = item['ReadCount']
            epi.id = item['BbcId']
            epi.show()
            target.append(epi)
            i += 1
    else:
        try: n = int(t)
        except: break
        else: target[n].download()
print('Cleaning cache...')
target = listdir('Homepage')
for item in target:
    remove('Homepage/'+item)
sleep(1)

注意：需要第三方requests库支持
食用方法：使用python运行代码，输入你想要的音频编号(可进入脚本生成的'Homepage'文件夹下查看对应主题图)，自动下载至生成的'Download'目录下。(爬取的是'BBC六分钟英语'中的最新推送)
演示视频(偷懒，用termux演示)，.py文件以及以后的更新：
https://www.lanzoux.com/b00zqpndi
密码:4yhk
LOG
9.30[v2.0] 添加'下一页'功能

flypds · 发表于 2020-8-15 08:05

这是手机软件吗?

三滑稽甲苯 · 发表于 2020-8-15 08:19

flypds 发表于 2020-8-15 08:05
这是手机软件吗?

是针对BBC应用的python脚本

Lowarex · 发表于 2020-8-15 08:47

谢谢楼主分享

天空宫阙 · 发表于 2020-8-15 08:56

请问这个app叫什么名字，可以在应用商店搜到吗？

偶尔平凡 · 发表于 2020-8-15 09:08

提示: 作者被禁止或删除内容自动屏蔽

三滑稽甲苯 · 发表于 2020-8-15 09:35

天空宫阙发表于 2020-8-15 08:56
请问这个app叫什么名字，可以在应用商店搜到吗？

就叫BBC，华为应用市场里有

Screenshot_20200815_093508_com.huawei.appmarket.jpg

YiQiu · 发表于 2020-8-16 23:26

想到了VOA，DW也有不少好东西，免费可以下载就是需要翻墙，有时候还挺麻烦

深水夜藏 · 发表于 2020-8-17 20:13

感谢大牛分享，学习了

帐号		自动登录	找回密码
密码			注册[Register]

偶尔平凡偶尔平凡当前离线好友阅读权限 0 听众最后登录 1970-1-1 头像被屏蔽	偶尔平凡发表于 2020-8-15 09:08 提示: 作者被禁止或删除内容自动屏蔽
偶尔平凡偶尔平凡当前离线好友阅读权限 0 听众最后登录 1970-1-1 头像被屏蔽
	回复支持举报

[Python 转载] BBC软件(com.iyuba.bbcstub)六分钟英语爬虫v2.0

免费评分