吾爱破解 - LCG - LSG |安卓破解|病毒分析|www.52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 7559|回复: 52
收起左侧

[Python 转载] 无水印爬取抖音视频

  [复制链接]
Hswyc 发表于 2022-5-10 21:46
本帖最后由 Hswyc 于 2022-5-10 21:48 编辑

自己学习所写的程序,代码仅供学习交流

获取网页源码思路在这篇帖子https://www.52pojie.cn/thread-1612413-1-1.html

写这代码的目的是,刷抖音时候遇见了自己很喜欢的视频(懂得都懂),想把它给保存下来日后观看。因为有些作者不开放下载链接和下载的视频都有水印,所以就写代码解析下载。
输入,可以是视频的分享链接,也可以是视频id


分享链接获取

id的获取

源码
[Python] 纯文本查看 复制代码
import requests
import re
import os
from selenium import webdriver

header = {
    "user-agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/101.0.4951.54 Safari/537.36 Edg/101.0.1210.39 '
}


def get_video_id(video_url):
    """
    根据视频链接,获得视频的id
    """
    option = webdriver.ChromeOptions()
    option.add_argument("--headless")
    option.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
                        'like Gecko) Chrome/96.0.4664.110 Safari/537.36')
    driver = webdriver.Chrome(options=option)
    driver.get(video_url)
    # 获得源码
    html = driver.page_source

    id_pattern = '<div class="IsE_azet">.*?from_gid=(.*?)&'
    video_id = re.findall(id_pattern, html)

    return video_id[0]


def get_video_info(video_id):
    """
    利用video_id 获得视频的一些信息,作者,无水印链接等
    """
    video_info_url = f'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids={video_id}'

    # 获得视频的信息
    resp = requests.get(video_info_url, headers=header)
    video_info_json = resp.json()
    resp.close()

    # 获取作者名字
    author_name = video_info_json['item_list'][0]['author']['nickname']
    # 获取视频标题
    video_title_tmp = video_info_json['item_list'][0]['share_info']['share_title']
    if video_title_tmp.startswith('#'):
        video_title = video_title_tmp.split('#')[1].split('@')[0]
    else:
        video_title = video_title_tmp.split('#')[0].split('@')[0]
    # 用于保存文件的名字
    title = author_name + '-' + video_title + '-' + video_id
    # 获取视频的无水印地址,wm=watermark
    video_true_url = video_info_json['item_list'][0]['video']['play_addr']['url_list'][0].replace('/playwm/', '/play/')

    return title, video_true_url


def download(title, video_true_url):
    """
    下载视频
    """
    # 创建存放位置
    path = '../dy_video'
    if os.path.exists(path):
        pass
    else:
        os.makedirs(path)

    # 获取视频
    resp = requests.get(video_true_url, headers=header)
    content = resp.content
    resp.close()
    # 下载视频
    with open(f'../dy_video/{title}.mp4', 'wb') as f:
        f.write(content)

    print("视频下载完成!")
    print(f"视频名字为:{title}.mp4")


def main():
    """
    主程序
    """
    url_or_id = str(input("输入分享链接或者视频id:"))
    if len(url_or_id) == 19:
        # 长度小于30,是一个视频id,id长度为19
        print('输入的是视频id。')
        title, video_true_url = get_video_info(url_or_id)
        download(title, video_true_url)
    else:
        # 是一个分享链接
        print('输入的是视频分享链接。')
        video_url = url_or_id.split(" ")[-2]
        video_id = get_video_id(video_url)
        title, video_true_url = get_video_info(video_id)
        download(title, video_true_url)


if __name__ == '__main__':
    main()


下载效果

下载

下载

免费评分

参与人数 17吾爱币 +23 热心值 +15 收起 理由
qz32cocomi + 1 + 1 谢谢@Thanks!
wowan135 + 1 + 1 热心回复!
bing98 + 1 + 1 谢谢@Thanks!
52pj_zw + 1 用心讨论,共获提升!
sssguo + 1 + 1 谢谢@Thanks!
yystrive + 1 + 1 谢谢@Thanks!
苏紫方璇 + 7 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
fu520 + 1 + 1 热心回复!
lfm333 + 1 + 1 谢谢@Thanks!
文刀刘 + 1 谢谢@Thanks!
jwj + 1 + 1 谢谢@Thanks!
彩色沙漠 + 2 + 1 遇事不决问Selinum
lxhyjr + 1 + 1 谢谢@Thanks!
zhangzsf + 1 + 1 谢谢@Thanks!
Zz4794zZ + 1 + 1 谢谢@Thanks!
lgc81034 + 1 谢谢@Thanks!
yjn866y + 1 + 1 我很赞同!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

knian 发表于 2022-5-11 18:05
我也搞过一个,分享一下,大佬别笑话我,初学着
[Python] 纯文本查看 复制代码
# -*- coding: UTF-8 -*-
import requests
import random
import re
import json
import time


# PC端
PCUA=[
	# safari 5.1 – MAC
	"Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
	# safari 5.1 – Windows
	"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-us) AppleWebKit/534.50 (KHTML, like Gecko) Version/5.1 Safari/534.50",
	# IE 9.0
	"Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;",
	# IE 8.0
	"Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0)",
	# IE 7.0
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)",
	# IE 6.0
	"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)",
	# Firefox 4.0.1 – MAC
	"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
	# Firefox 4.0.1 – Windows
	"Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1",
	# Opera 11.11 – MAC
	"Opera/9.80 (Macintosh; Intel Mac OS X 10.6.8; U; en) Presto/2.8.131 Version/11.11",
	# Opera 11.11 – Windows
	"Opera/9.80 (Windows NT 6.1; U; en) Presto/2.8.131 Version/11.11",
	# Chrome 17.0 – MAC
	"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_0) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.56 Safari/535.11",
	# 傲游(Maxthon)
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon 2.0)",
	# 腾讯TT
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; TencentTraveler 4.0)",
	# 世界之窗(The World) 2.x
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)",
	# 世界之窗(The World) 3.x
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; The World)",
	# 搜狗浏览器 1.x
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SE 2.X MetaSr 1.0; SE 2.X MetaSr 1.0; .NET CLR 2.0.50727; SE 2.X MetaSr 1.0)",
	# 360浏览器
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; 360SE)",
	# Avant
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Avant Browser)",
	# Green Browser
	"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)",
	# chrome
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36", 
	# 火狐
	"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0"
]
	# 移动设备端
mobileUA = [
	# safari iOS 4.33 – iPhone
	"Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5",
	# safari iOS 4.33 – iPod Touch
	"Mozilla/5.0 (iPod; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5",
	# safari iOS 4.33 – iPad
	"Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5",
	# Android N1
	"Mozilla/5.0 (Linux; U; Android 2.3.7; en-us; Nexus One Build/FRF91) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
	# Android QQ浏览器 For android
	"MQQBrowser/26 Mozilla/5.0 (Linux; U; Android 2.3.7; zh-cn; MB200 Build/GRJ22; CyanogenMod-7) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
	# Android Opera Mobile
	"Opera/9.80 (Android 2.3.4; Linux; Opera Mobi/build-1107180945; U; en-GB) Presto/2.8.149 Version/11.10",
	# Android Pad Moto Xoom
	"Mozilla/5.0 (Linux; U; Android 3.0; en-us; Xoom Build/HRI39) AppleWebKit/534.13 (KHTML, like Gecko) Version/4.0 Safari/534.13",
	# BlackBerry
	"Mozilla/5.0 (BlackBerry; U; BlackBerry 9800; en) AppleWebKit/534.1+ (KHTML, like Gecko) Version/6.0.0.337 Mobile Safari/534.1+",
	# WebOS HP Touchpad
	"Mozilla/5.0 (hp-tablet; Linux; hpwOS/3.0.0; U; en-US) AppleWebKit/534.6 (KHTML, like Gecko) wOSBrowser/233.70 Safari/534.6 TouchPad/1.0",
	# Nokia N97
	"Mozilla/5.0 (SymbianOS/9.4; Series60/5.0 NokiaN97-1/20.0.019; Profile/MIDP-2.1 Configuration/CLDC-1.1) AppleWebKit/525 (KHTML, like Gecko) BrowserNG/7.1.18124",
	# Windows Phone Mango
	"Mozilla/5.0 (compatible; MSIE 9.0; Windows Phone OS 7.5; Trident/5.0; IEMobile/9.0; HTC; Titan)",
	# UC无
	"UCWEB7.0.2.37/28/999",
	# UC标准
	"NOKIA5700/ UCWEB7.0.2.37/28/999",
	# UCOpenwave
	"Openwave/ UCWEB7.0.2.37/28/999",
	# UC Opera
	"Mozilla/4.0 (compatible; MSIE 6.0; ) Opera/UCWEB7.0.2.37/28/999",
	# UC
	"Mozilla/5.0 (Linux; U; Android 10; zh-CN; Redmi K20 Pro Build/QKQ1.190825.002) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/57.0.2987.108 UCBrowser/12.0.4.988 Mobile Safari/537.36"
]
# 随机UA
headers ={"User-Agent":random.choice(PCUA)}
# print("PC端:",random.choice(PCUA))
# print("移动端:",random.choice(mobileUA))

def getVieoUrl(url):
	# 获得原地址
	response = requests.get(url, headers=headers,allow_redirects=False)
	share_url = response.headers['Location']
	# 取出videoID
	videoID = re.search('\\d{19}/', share_url)[0].replace("/","")
	# 拼装videoID:
	url_temp1 = 'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids='+ videoID
	# 获得vid:
	response = requests.get(url_temp1, headers=headers,allow_redirects=False)
	json_data = json.loads(response.text)
	video_vid =	json_data.get('item_list')[0].get('video').get('vid')

	if len(video_vid) <= 0 :
		print("\n这个不是视频,可能是图片视频\n")
		# 是否有样品页,默认为否
		print("输入0 为开启样品页,即只输出偶数页;\n输入1 为开启样品页,即只输出奇数页;\n不输入,或者输入其他,为不开启样品页,即展示所有\n")
		switch_temp = input("是否开启:")
		switch0 = False
		switch1 = False
		if len(switch_temp) == 1 :
			if int(switch_temp) == 0 :
				switch0 = True
			elif int(switch_temp) == 1 :
				switch1 = True

		# 相关信息
		# 名称 
		au_nickname = json_data.get('item_list')[0].get("author").get("nickname");
		# 抖音号
		dy_ID = json_data.get('item_list')[0].get("author").get("unique_id");
		# 高清头像
		au_head_img = json_data.get('item_list')[0].get("author").get("avatar_larger").get("url_list")[0];
		# 签名
		au_signature = json_data.get('item_list')[0].get("author").get("signature").replace("\n","---");
		# 发布日期
		create_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(json_data.get('item_list')[0].get("create_time"))) # 2013--10--10 23:40:00
		# 文案
		copywriting = json_data.get('item_list')[0].get("desc");
		# 无水印图片地址
		img_noFlag = json_data.get('item_list')[0].get("images");
		
		print("[作者名称:]\t" + au_nickname)
		print("[抖音号:]\t" + dy_ID)
		print("[高清头像:]\t" + au_head_img)
		print("[作者签名:]\t" + au_signature)
		print("[发布时间:]\t" + create_time) 
		print("[文案:]\t" + copywriting) 
		print("无水印图片共("+str(len(img_noFlag))+")张\n" ) 

		s = 0	# 计数器
		for image in img_noFlag:
			# 是否有样品页,默认为否
			if switch0:
				# 只展示偶数页
				if (s+1) % 2 == 0:
					print(image.get("url_list")[0]+"\n")
			elif switch1 :
				# 只展示奇数页
				if (s+1) % 2 != 0:
					print(image.get("url_list")[0]+"\n")
			else:
				# 展示所有
				print(image.get("url_list")[0]+"\n")
			s = s+1
		return "解析完成!\n"
	else:
		# 拼装无水印地址:
		url_video = "https://aweme.snssdk.com/aweme/v1/play/?video_id="+video_vid+"&ratio=1080p&line=0"
		return url_video

# 获取分享中的网址
def getUrl(str):
	videoPath = re.search('https://v.douyin.com/[A-Za-z0-9]{6,10}/', str)[0]
	return videoPath

# text = "视频  https://v.douyin.com/J39oEEK/"
# text = "图片 https://v.douyin.com/RcykpdP/"
text = input("粘贴分享地址:")
print(getVieoUrl(getUrl(text)))
input("请复制地址后关闭此窗口即可。")






'''
原理
	https://www.daimadog.com/douyin
短地址:
	https://v.douyin.com/J3arcH7/
原地址:
	https://www.iesdouyin.com/share/video/6921631371878337800/?region=CN&mid=6921631627051993870&u_code=g7lc5ik2&titleType=title&did=2128307037941863&iid=2058703107526539&utm_source=copy_link&utm_campaign=client_share&utm_medium=android&app=aweme

取出videoID:
	https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=6921631371878337800

获得vid:
	v0300f7d0000c0793pu43pnr7fc6hti0

获得无水印地址:
	https://aweme.snssdk.com/aweme/v1/play/?video_id=v0300f7d0000c0793pu43pnr7fc6hti0&ratio=1080p&line=0

'''
头像被屏蔽
xiadongming 发表于 2022-5-10 22:22
吖力锅 发表于 2022-5-10 22:23
Tauruslsj 发表于 2022-5-10 23:15
这个不涉及版权问题吗?
ytfsxxy 发表于 2022-5-10 23:16
有没有可以下载B站的
小坏丶 发表于 2022-5-10 23:24
厉害了,
 楼主| Hswyc 发表于 2022-5-10 23:25
Tauruslsj 发表于 2022-5-10 23:15
这个不涉及版权问题吗?

仅供学习交流,下载的视频就自己看吧,别有其他用途
yjn866y 发表于 2022-5-10 23:27
就是想知道楼主 的video_info_url = f'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=,,,这个API是怎么来的
 楼主| Hswyc 发表于 2022-5-10 23:38
yjn866y 发表于 2022-5-10 23:27
就是想知道楼主 的video_info_url = f'https://www.iesdouyin.com/web/api/v2/aweme/iteminfo/?item_ids=, ...

打开分享的视频链接,F12,然后切换设备仿真成手机,刷新,再看网络就可以得到了
 楼主| Hswyc 发表于 2022-5-10 23:39
ytfsxxy 发表于 2022-5-10 23:16
有没有可以下载B站的

B站的目前没有
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则 警告:本版块禁止灌水或回复与主题无关内容,违者重罚!

快速回复 收藏帖子 返回列表 搜索

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-4-29 05:02

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表