吾爱破解 - LCG - LSG |安卓破解|病毒分析|www.52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 8063|回复: 22
收起左侧

[Python 转载] Python实现抓取斗鱼实时弹幕

[复制链接]
obeina 发表于 2019-4-17 14:03
本帖最后由 obeina 于 2019-4-17 15:33 编辑


本程序在Ubuntu18.04和Python3环境下测试成功!
斗鱼弹幕服务器第三方接入协议v1.6.2.txt (829.84 KB, 下载次数: 118)
将下载的斗鱼弹幕服务器第三方接入协议v1.6.2.txt修改扩展为.pdf

需要安装三个库(requests,BeautifulSoup4,lxml)
pip install requests BeautifulSoup4 lxml
[Python] 纯文本查看 复制代码
'''
文件名:爬取斗鱼直播间信息到jsonline文件.py
'''
from __future__ import unicode_literals
import multiprocessing
import socket
import time
import re
import requests
from bs4 import BeautifulSoup
import json

# 配置socket的ip和端口
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
host = socket.gethostbyname("openbarrage.douyutv.com")
port = 8601
client.connect((host, port))

# 获取用户昵称及弹幕信息的正则表达式
danmu = re.compile(b'type@=chatmsg.*?/nn@=(.*?)/txt@=(.*?)/.*?/level@=(.*?)/.*?/bnn@=(.*?)/bl@=(.*?)/')


def sendmsg(msgstr):
    '''
    客户端向服务器发送请求的函数,集成发送协议头的功能
    msgHead: 发送数据前的协议头,消息长度的两倍,及消息类型、加密字段和保密字段
    使用while循环发送具体数据,保证将数据都发送出去
    '''
    msg = msgstr.encode('utf-8')
    data_length = len(msg) + 8
    code = 689
    msgHead = int.to_bytes(data_length, 4, 'little') \
              + int.to_bytes(data_length, 4, 'little') + int.to_bytes(code, 4, 'little')
    client.send(msgHead)
    sent = 0
    while sent < len(msg):
        tn = client.send(msg[sent:])
        sent = sent + tn


def start(roomid):
    '''
    发送登录验证请求后,获取服务器返回的弹幕信息,同时提取昵称及弹幕内容
    登陆请求消息及入组消息末尾要加入\0
    '''
    msg = 'type@=loginreq/roomid@={}/\0'.format(roomid)
    sendmsg(msg)
    msg_more = 'type@=joingroup/rid@={}/gid@=-9999/\0'.format(roomid)
    sendmsg(msg_more)

    print('---------------欢迎连接到{}的直播间---------------'.format(get_name(roomid)))
    while True:
        data = client.recv(1024)
        danmu_more = danmu.findall(data)
        if not data:
            break
        else:
            with open(format(get_name(roomid))+time.strftime('%Y.%m.%d',time.localtime(time.time()))+'直播弹幕', 'a') as f:
                try:
                    for i in danmu_more:
                        dmDict={}
                        #print(i)
                        dmDict['昵称'] = i[0].decode(encoding='utf-8', errors='ignore')
                        dmDict['弹幕内容'] = i[1].decode(encoding='utf-8', errors='ignore')
                        dmDict['等级'] = i[2].decode(encoding='utf-8', errors='ignore')
                        dmDict['徽章昵称'] = i[3].decode(encoding='utf-8', errors='ignore')
                        dmDict['徽章等级'] = i[4].decode(encoding='utf-8', errors='ignore')
                        dmJsonStr = json.dumps(dmDict, ensure_ascii=False)+'\n'
                        #print(dmDict['昵称'])
                        print(dmDict['弹幕内容'])
                        f.write(dmJsonStr)
                        danmuNum = danmuNum + 1
                except:
                    continue

def keeplive():
    '''
    发送心跳信息,维持TCP长连接
    心跳消息末尾加入\0
    '''
    while True:
        msg = 'type@=mrkl/\0'
        sendmsg(msg)
        time.sleep(45)


def get_name(roomid):
    '''
    利用BeautifulSoup获取直播间标题
    '''
    r = requests.get("http://www.douyu.com/" + roomid)
    soup = BeautifulSoup(r.text, 'lxml')
    return soup.find('a', {'class', 'Title-anchorName'}).string

# 启动程序
if __name__ == '__main__':
    room_id = input('请输入房间ID: ')
    p1 = multiprocessing.Process(target=start, args=(room_id,))
    p2 = multiprocessing.Process(target=keeplive)
    p1.start()
    p2.start()


弹幕消息会滚动在终端上
131644sa8ao2axqq28aa8a.png


且会在当前目录下生成以主播名字命名的文件
131809gdfxf0ztjljjj54x.png

免费评分

参与人数 7吾爱币 +8 热心值 +6 收起 理由
bigrose + 1 + 1 我很赞同!
lovelive + 1 + 1 keeplive的新格式是不是也失效了?弹幕只能抓取40秒
隰则有泮 + 1 热心回复!
wushaominkk + 3 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!
dioderen + 1 + 1 谢谢@Thanks!
李小木 + 1 用心讨论,共获提升!
lilips + 1 + 1 用心讨论,共获提升!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

Licoy 发表于 2019-4-17 14:29
[Python] 纯文本查看 复制代码
请输入房间ID: 5096857
Process Process-1:
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/licoy/dev/languang/python/demo/douyu.py", line 51, in start
    print('---------------欢迎连接到{}的直播间---------------'.format(get_name(roomid)))
  File "/Users/licoy/dev/languang/python/demo/douyu.py", line 94, in get_name
    soup = BeautifulSoup(r.text, 'lxml')
  File "/usr/local/lib/python3.7/site-packages/bs4/__init__.py", line 196, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
陈家丶妖孽 发表于 2019-4-17 14:21
Licoy 发表于 2019-4-17 14:42
Licoy 发表于 2019-4-17 14:29
[mw_shl_code=python,true]请输入房间ID: 5096857
Process Process-1:
Traceback (most recent call las ...

去掉get_name的调用就可以了
uumesafe 发表于 2019-4-17 15:11
Licoy 发表于 2019-4-17 14:29
[mw_shl_code=python,true]请输入房间ID: 5096857
Process Process-1:
Traceback (most recent call las ...

bs4 库没有安装哦。
uumesafe 发表于 2019-4-17 15:25
uumesafe 发表于 2019-4-17 15:11
bs4 库没有安装哦。

lxml库安装一下
 楼主| obeina 发表于 2019-4-17 15:32
Licoy 发表于 2019-4-17 14:29
[mw_shl_code=python,true]请输入房间ID: 5096857
Process Process-1:
Traceback (most recent call las ...

还需要安装三个库(requests,BeautifulSoup4,lxml)
pip install requests BeautifulSoup4 lxml
wan66d 发表于 2019-4-17 18:00
window环境下能实现么
wan66d 发表于 2019-4-17 18:39
window下实验ok
newpowersky 发表于 2019-4-17 19:18
这个协议的文本看不了吧
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则 警告:本版块禁止灌水或回复与主题无关内容,违者重罚!

快速回复 收藏帖子 返回列表 搜索

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-4-20 07:51

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表