吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 7837|回复: 65
收起左侧

[Python 原创] 【某趣阁】小说下载,纯python脚本,无需selenium等自动化

  [复制链接]
PangXiaoBin 发表于 2024-11-23 15:52

脚本说明

某趣阁小说下载,纯python脚本,无需selenium等自动化工具

命令行支持小说搜索,和下载

只需要request和bs4依赖

pip install requests bs4 -i https://pypi.tuna.tsinghua.edu.cn/simple

完整代码

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @file    : main.py
import time
import urllib.parse

import requests
import copy
import os
from bs4 import BeautifulSoup

HEADERS = {
    "authority": "www.biqg.cc",
    "accept": "application/json",
    "accept-language": "zh,en;q=0.9,zh-CN;q=0.8",
    "cache-control": "no-cache",
    "pragma": "no-cache",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-origin",
    "user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1",
    "x-requested-with": "XMLHttpRequest",
}

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
DOWNLOAD_PATH = os.path.join(BASE_DIR, "result")

def get_hm_cookie(url):
    session = requests.Session()
    session.get(url=url, headers=HEADERS, timeout=10)
    return session

def search(key_word):
    new_header = copy.deepcopy(HEADERS)
    new_header["referer"] = urllib.parse.quote(
        f"https://www.biqg.cc/s?q={key_word}", safe="/&=:?"
    )

    hm_url = urllib.parse.quote(
        f"https://www.biqg.cc/user/hm.html?q={key_word}", safe="/&=:?"
    )
    session = get_hm_cookie(hm_url)

    params = {
        "q": f"{key_word}",
    }
    try:
        response = session.get(
            "https://www.biqg.cc/user/search.html",
            params=params,
            headers=new_header,
            timeout=10,
        )
    except Exception as e:
        print(f"搜索{key_word}时失败,错误信息:{e}")
        return [], session
    data = response.json()
    return data, session

def download_by_tag(tag, href, result_path, session):
    title = f"{tag.text}"
    url = f"https://www.biqg.cc{href}"
    print(f"开始下载第{title} url: {url}")

    result_file_name = os.path.join(result_path, f"{title}.txt")
    with open(result_file_name, "w+", encoding="utf-8") as f:
        content_response = session.get(url, headers=HEADERS)
        content_soup = BeautifulSoup(content_response.content, "html.parser")
        text = content_soup.find(id="chaptercontent")
        for i in text.get_text().split("  ")[1:-2]:
            f.write(f"{i}\n")
    time.sleep(0.2)

def download_txt(download_url, path_name, session):
    """
    下载小说
    :param download_url: 下载链接
    :param path_name: 存储文件名
    :return:
    """
    result_path = os.path.join(DOWNLOAD_PATH, path_name)
    if not os.path.exists(result_path):
        os.makedirs(result_path, exist_ok=True)
    try:
        response = session.get(download_url, headers=HEADERS, timeout=10)
        soup = BeautifulSoup(response.content, "html.parser")
        down_load_url = soup.select("div[class='listmain'] dl dd a")
        for tag in down_load_url:
            href = tag["href"]
            if href == "javascript:dd_show()":
                hide_dd = soup.select("span[class='dd_hide'] dd a")
                for hide_tag in hide_dd:
                    href = hide_tag["href"]
                    download_by_tag(hide_tag, href, result_path, session)
            else:
                download_by_tag(tag, href, result_path, session)
    except Exception as e:
        import traceback

        print(traceback.format_exc())
        print(f"下载{download_url}失败,错误信息:{e}")

def run():
    while True:
        keyword = input("请输入搜索的小说名or输入q退出:")
        if keyword.replace(" ", "").lower() == "q":
            break
        if not keyword:
            continue
        data_list, session = search(keyword)
        if not data_list or data_list == 1:
            print("请重试.......")
            continue
        for i in range(len(data_list)):
            item = data_list[i]
            articlename = item.get("articlename")
            author = item.get("author")
            print(f"编号:{i} 书名:{articlename}----->{author}")
        while True:
            try:
                num_book = int(input("请输入需要下载的编号:"))
            except Exception:
                print("请输入正确的编号")
                continue
            try:
                item = data_list[num_book]
            except Exception:
                print("编号超出了预期,请请重新输入")
                continue
            break

        url_list = f"https://www.biqg.cc{item.get('url_list')}"
        print(f"开始下载{url_list}")
        path_name = f"{item.get('articlename', '')}___{item.get('author', '')}"

        download_txt(url_list, path_name, session)

if __name__ == "__main__":
    run()

运行

python main.py

# 输入小说名
# 选择要下载的编号
# 等待下载就行了

常见问题

1、会存在偶尔失败问题,犹豫网络问题导致,失败后重新运行就行

运行截图


WX20241123-154927@2x.png

源码.zip

3 KB, 下载次数: 302, 下载积分: 吾爱币 -1 CB

源码

免费评分

参与人数 13吾爱币 +19 热心值 +11 收起 理由
lesleyLee92 + 1 谢谢@Thanks!
info4 + 1 谢谢@Thanks!
三滑稽甲苯 + 2 + 1 用心讨论,共获提升!
cswx + 1 + 1 挺不错的
psqladm + 1 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!
52hope + 1 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
flyhigh + 1 + 1 谢谢@Thanks!
520lu + 1 + 1 谢谢@Thanks!
CSKSuper + 1 支持技术交流
苏紫方璇 + 7 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
YYTCJX + 1 + 1 谢谢@Thanks!
lfm333 + 1 + 1 谢谢@Thanks!
ruanxiaoqi + 1 + 1 我很赞同!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

qzwsa 发表于 2024-11-24 11:09
根据个人习惯在原基础上修改了几点:1、将一本书保存为一个文本 2、增加多线程加速下载 3、在搜索完成后增加全选下载


[Python] 纯文本查看 复制代码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# [url=home.php?mod=space&uid=267492]@file[/url]    : main.py
import time
import urllib.parse

import requests
import copy
import os
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor, as_completed
from threading import Lock

HEADERS = {
    "authority": "www.biqg.cc",
    "accept": "application/json",
    "accept-language": "zh,en;q=0.9,zh-CN;q=0.8",
    "cache-control": "no-cache",
    "pragma": "no-cache",
    "sec-fetch-dest": "empty",
    "sec-fetch-mode": "cors",
    "sec-fetch-site": "same-origin",
    "user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 16_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.6 Mobile/15E148 Safari/604.1",
    "x-requested-with": "XMLHttpRequest",
}

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
DOWNLOAD_PATH = os.path.join(BASE_DIR, "result")

def get_hm_cookie(url):
    session = requests.Session()
    session.get(url=url, headers=HEADERS, timeout=10)
    return session

def search(key_word):
    new_header = copy.deepcopy(HEADERS)
    new_header["referer"] = urllib.parse.quote(
        f"https://www.biqg.cc/s?q={key_word}", safe="/&=:?"
    )

    hm_url = urllib.parse.quote(
        f"https://www.biqg.cc/user/hm.html?q={key_word}", safe="/&=:?"
    )
    session = get_hm_cookie(hm_url)

    params = {
        "q": f"{key_word}",
    }
    try:
        response = session.get(
            "https://www.biqg.cc/user/search.html",
            params=params,
            headers=new_header,
            timeout=10,
        )
    except Exception as e:
        print(f"搜索{key_word}时失败,错误信息:{e}")
        return [], session
    data = response.json()
    return data, session

def download_chapter(args):
    """
    下载单个章节的内容
    返回: (序号, 标题, 内容) 的元组
    """
    tag, href, session, index = args
    title = f"{tag.text}"
    url = f"https://www.biqg.cc{href}"
    print(f"开始下载章节:{title} url: {url}")

    try:
        content_response = session.get(url, headers=HEADERS)
        content_soup = BeautifulSoup(content_response.content, "html.parser")
        text = content_soup.find(id="chaptercontent")
        
        # 获取章节内容
        content = []
        content.append(f"\n\n{title}\n\n")
        for i in text.get_text().split("  ")[1:-2]:
            content.append(f"{i}\n")
        
        return (index, title, "".join(content))
    except Exception as e:
        print(f"下载章节 {title} 失败: {e}")
        return (index, title, f"\n\n{title}\n\n下载失败: {str(e)}\n\n")

def download_txt(download_url, path_name, session):
    """
    下载小说
    :param download_url: 下载链接
    :param path_name: 存储文件名
    :return:
    """
    if not os.path.exists(DOWNLOAD_PATH):
        os.makedirs(DOWNLOAD_PATH, exist_ok=True)
        
    result_file_path = os.path.join(DOWNLOAD_PATH, f"{path_name}.txt")
    
    try:
        # 获取所有章节链接
        response = session.get(download_url, headers=HEADERS, timeout=10)
        soup = BeautifulSoup(response.content, "html.parser")
        down_load_url = soup.select("div[class='listmain'] dl dd a")
        
        # 准备所有要下载的章节信息
        chapters_to_download = []
        index = 0
        
        for tag in down_load_url:
            href = tag["href"]
            if href == "javascript:dd_show()":
                hide_dd = soup.select("span[class='dd_hide'] dd a")
                for hide_tag in hide_dd:
                    chapters_to_download.append((hide_tag, hide_tag["href"], session, index))
                    index += 1
            else:
                chapters_to_download.append((tag, href, session, index))
                index += 1

        # 使用线程池并发下载
        chapter_contents = {}
        max_workers = min(20, len(chapters_to_download))  # 最多20个线程
        print(f"开始并发下载,使用{max_workers}个线程...")
        
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            # 提交所有下载任务
            future_to_chapter = {
                executor.submit(download_chapter, args): args 
                for args in chapters_to_download
            }
            
            # 收集下载结果
            for future in as_completed(future_to_chapter):
                index, title, content = future.result()
                chapter_contents[index] = content
        
        # 按顺序写入文件
        with open(result_file_path, "w", encoding="utf-8") as result_file:
            # 写入书名
            result_file.write(f"《{path_name}》\n\n")
            
            # 按章节序号顺序写入内容
            for i in range(len(chapter_contents)):
                result_file.write(chapter_contents[i])
                
        print(f"《{path_name}》下载完成!保存至: {result_file_path}")
            
    except Exception as e:
        import traceback
        print(traceback.format_exc())
        print(f"下载{download_url}失败,错误信息:{e}")

def run():
    while True:
        keyword = input("请输入搜索的小说名or输入q退出:")
        if keyword.replace(" ", "").lower() == "q":
            break
        if not keyword:
            continue
        data_list, session = search(keyword)
        if not data_list or data_list == 1:
            print("请重试.......")
            continue

        # 显示搜索结果
        print("\n搜索结果:")
        print("-" * 50)
        for i in range(len(data_list)):
            item = data_list[i]
            articlename = item.get("articlename")
            author = item.get("author")
            print(f"编号:{i} 书名:{articlename}----->{author}")
        print("-" * 50)
        print("提示:")
        print("1. 输入单个数字下载单本")
        print("2. 输入多个数字并用逗号分隔下载多本,如:0,1,2")
        print("3. 输入 all 下载全部")
        
        while True:
            try:
                choice = input("请输入需要下载的编号:").strip().lower()
                
                # 处理全部下载的情况
                if choice == 'all':
                    selected_indices = list(range(len(data_list)))
                    break
                
                # 处理单本或多本下载的情况
                selected_indices = []
                for num in choice.split(','):
                    num = int(num.strip())
                    if 0 <= num < len(data_list):
                        selected_indices.append(num)
                    else:
                        raise ValueError(f"编号 {num} 超出范围")
                
                if not selected_indices:
                    raise ValueError("未选择任何有效编号")
                break
                
            except ValueError as e:
                print(f"输入错误: {str(e)}")
                continue
        
        # 下载选中的书籍
        print(f"\n准备下载 {len(selected_indices)} 本书...")
        for num_book in selected_indices:
            try:
                item = data_list[num_book]
                url_list = f"https://www.biqg.cc{item.get('url_list')}"
                articlename = item.get('articlename', '')
                author = item.get('author', '')
                path_name = f"{articlename}___{author}"
                
                print(f"\n开始下载 《{articlename}》 作者:{author}")
                print(f"下载链接: {url_list}")
                
                download_txt(url_list, path_name, session)
                
            except Exception as e:
                print(f"下载编号 {num_book} 的书籍时出错: {str(e)}")
                continue
        
        print("\n所有选中的书籍下载完成!")

if __name__ == "__main__":
    run()

免费评分

参与人数 3吾爱币 +7 热心值 +3 收起 理由
duidui + 1 + 1 热心回复!
苏紫方璇 + 5 + 1 欢迎分析讨论交流,吾爱破解论坛有你更精彩!
资料爬虫 + 1 + 1 热心回复!

查看全部评分

yy67283080 发表于 2024-11-23 18:45
jiji2024 发表于 2024-11-23 18:51
hu123123123 发表于 2024-11-23 19:07
很厉害啊 可以看小说了
ghoob321 发表于 2024-11-23 19:40
pip 现在不好用 卡的要死
xxc99 发表于 2024-11-23 20:45
感谢分享,mark
anorith 发表于 2024-11-23 21:00
书虫狂喜
doodong 发表于 2024-11-23 21:10
这个老牛了,谢谢分享!
Beginners 发表于 2024-11-23 21:33
不错不错,感谢楼主分享
zs6342133 发表于 2024-11-23 21:40
这个可以有,感谢分享~
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - 52pojie.cn ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2026-1-13 17:09

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表