吾爱破解 - 52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 744|回复: 7
收起左侧

[求助] 新手求助,data:image 这种动态加载的图片如何下载?

[复制链接]
miracle1989 发表于 2024-8-17 00:01
通过代码请求到的html始终没有data:image相关的值,怀疑是前端动态加载导致无法通过python的BS4获取,请问这种情况除了使用selenium 还有其他方法吗? 微信截图_20240816234714.png
html代码如下:
<div class="post-content">
<img src="data:image/jpeg;base64,/9j/2wCEABALDZ" data-xuid="1" data-xkrkllgl="https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg" alt="photo_2024-08-16_11-52-27.jpg" title="photo_2024-08-16_11-52-27.jpg" data-action="zoom">
</div>

[Python] 纯文本查看 复制代码
import requests
from bs4 import BeautifulSoup
import base64
import imghdr


url = 'xxxx'

# 获取网页内容
response = requests.get(url)
html_content = response.text

# 解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 找到img标签
img_tag = soup.find('img', src=lambda src: src and 'base64' in src)
print(img_tag)

# 检查img标签是否存在
if img_tag and 'src' in img_tag.attrs:
    # 获取加密的base64字符串
    encrypted_base64 = img_tag['src'].split(',')[1]
else:
    print('没有找到包含base64的img标签')
    exit()
path_to_output_image.jpg

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

十万菠萝拍黄瓜 发表于 2024-8-17 02:39
加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求2024081616535950939.jpeg先转b64,再解密
三滑稽甲苯 发表于 2024-8-17 07:26
 楼主| miracle1989 发表于 2024-8-17 08:46
十万菠萝拍黄瓜 发表于 2024-8-17 02:39
加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求202408161653 ...

大佬我这边根据您的提示试了下,还是不行,空了能否帮忙看看
[Python] 纯文本查看 复制代码
import os

from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
import requests
from bs4 import BeautifulSoup


def get_response(url, timeout=10):
    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Connection": "Keep-Alive",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh-CN,zh;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
    }
    try:
        response = requests.get(url, headers=headers,timeout=timeout)
        response.raise_for_status()  # 检查请求是否成功
        response.encoding = 'utf-8'
        return response
    except requests.exceptions.Timeout:
        print(f"请求超时:{url}")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP错误:{e.response.status_code}, {url}")
    except requests.exceptions.RequestException as e:
        print(f"请求异常:{e}, {url}")
    return None

def fetch_pic_urls(url):
    if url.startswith('http'):
        response = get_response(url)
        html_content = response.text
    # 使用BeautifulSoup解析HTML
    soup = BeautifulSoup(html_content, 'html.parser')

    # 查找所有包含data-xkrkllgl属性的img标签
    img_tags = soup.find_all('img', attrs={'data-xkrkllgl': True})

    # 提取并返回data-xkrkllgl属性的值
    pic_urls = [img['data-xkrkllgl'] for img in img_tags]
    return pic_urls



def decrypt_image_url(encrypted_url, key, iv):
    # 将key和iv转换为字节串
    key = bytes.fromhex(key)
    iv = bytes.fromhex(iv)

    # 将加密的URL从base64解码
    encrypted_data = base64.b64decode(encrypted_url)

    # 创建一个AES的CBC模式的解密器
    cipher = AES.new(key, AES.MODE_CBC, iv)

    # 解密数据
    decrypted_padded = cipher.decrypt(encrypted_data)

    # 去除填充
    decrypted_data = unpad(decrypted_padded, AES.block_size)

    # 将解密后的数据转换为字符串
    decrypted_url = decrypted_data.decode('utf-8')

    return decrypted_url

def download_image(url, save_dir='.', timeout=10):
    response = get_response(url, timeout)
    if response and response.status_code == 200:
        # 从URL中提取文件名
        filename = os.path.basename(url)
        # 确保保存目录存在
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
        # 拼接完整的文件路径
        file_path = os.path.join(save_dir, filename)
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f'Image downloaded successfully to {file_path}.')
    else:
        print('Failed to download image.')


def main():
    key = 'f5d965df75336270'
    iv = '97b60394abc2fbe1'
    url = 'xxxx'
    encrypted_urls = fetch_pic_urls(url)

    for encrypted_url in encrypted_urls:
        decrypted_url = decrypt_image_url(encrypted_url, key, iv)
        download_image(decrypted_url)


if __name__ == '__main__':
   main()
十万菠萝拍黄瓜 发表于 2024-8-17 09:20
本帖最后由 十万菠萝拍黄瓜 于 2024-8-17 09:22 编辑


单张图的例子, 改一下就行
[Python] 纯文本查看 复制代码
from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
from io import BytesIO
import requests

def decrypt_image(encrypted_base64):
    key = b"f5d965df75336270"
    iv = b"97b60394abc2fbe1"
    encrypted_data = base64.b64decode(encrypted_base64)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    decrypted_padded = cipher.decrypt(encrypted_data)
    decrypted_data = unpad(decrypted_padded, AES.block_size)
    return decrypted_data

def ab2b64(t):
    binary_data = BytesIO(t)
    data = binary_data.read()
    b64encoded = base64.b64encode(data)
    return b64encoded

def main():
    url = 'https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg'
    res = requests.get(url).content
    b64 = ab2b64(res)
    s = decrypt_image(b64)
    with open('1.jpg', 'wb') as f:
        f.write(s)
    print('111')


if __name__ == '__main__':
    main()

免费评分

参与人数 1热心值 +1 收起 理由
miracle1989 + 1 我很赞同!

查看全部评分

wasm2023 发表于 2024-8-17 09:47
如果是wasm生成的图片,并且元素里只有一个canvasid,请问如何去定位生成位置呢
puz_zle 发表于 2024-8-17 12:04
wasm2023 发表于 2024-8-17 09:47
如果是wasm生成的图片,并且元素里只有一个canvasid,请问如何去定位生成位置呢

分析接口 比这个省事
wasm2023 发表于 2024-8-17 14:34
puz_zle 发表于 2024-8-17 12:04
分析接口 比这个省事

没找到接口
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则

返回列表

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-12-13 17:49

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表