新手求助，data:image 这种动态加载的图片如何下载？

miracle1989 · 发表于 2024-8-17 00:01

通过代码请求到的html始终没有data:image相关的值，怀疑是前端动态加载导致无法通过python的BS4获取,请问这种情况除了使用selenium 还有其他方法吗？微信截图_20240816234714.png

html代码如下：
<div class="post-content">
<img src="data:image/jpeg;base64,/9j/2wCEABALDZ" data-xuid="1" data-xkrkllgl="https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg" alt="photo_2024-08-16_11-52-27.jpg" title="photo_2024-08-16_11-52-27.jpg" data-action="zoom">
</div>

[Python] 纯文本查看 复制代码

import requests
from bs4 import BeautifulSoup
import base64
import imghdr


url = 'xxxx'

# 获取网页内容
response = requests.get(url)
html_content = response.text

# 解析HTML
soup = BeautifulSoup(html_content, 'html.parser')

# 找到img标签
img_tag = soup.find('img', src=lambda src: src and 'base64' in src)
print(img_tag)

# 检查img标签是否存在
if img_tag and 'src' in img_tag.attrs:
    # 获取加密的base64字符串
    encrypted_base64 = img_tag['src'].split(',')[1]
else:
    print('没有找到包含base64的img标签')
    exit()

十万菠萝拍黄瓜 · 发表于 2024-8-17 02:39

加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求2024081616535950939.jpeg先转b64,再解密

三滑稽甲苯 · 发表于 2024-8-17 07:26

分析 js 脚本，找到解密的地方

miracle1989 · 发表于 2024-8-17 08:46

十万菠萝拍黄瓜发表于 2024-8-17 02:39
加密了, AES-CBC-Pkcs7, key是f5d965df75336270, iv是97b60394abc2fbe1, ab2b64直接问AI, 请求202408161653 ...

大佬我这边根据您的提示试了下，还是不行，空了能否帮忙看看

[Python] 纯文本查看 复制代码

import os

from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
import requests
from bs4 import BeautifulSoup


def get_response(url, timeout=10):
    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Connection": "Keep-Alive",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "zh-CN,zh;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.102 Safari/537.36"
    }
    try:
        response = requests.get(url, headers=headers,timeout=timeout)
        response.raise_for_status()  # 检查请求是否成功
        response.encoding = 'utf-8'
        return response
    except requests.exceptions.Timeout:
        print(f"请求超时：{url}")
    except requests.exceptions.HTTPError as e:
        print(f"HTTP错误：{e.response.status_code}, {url}")
    except requests.exceptions.RequestException as e:
        print(f"请求异常：{e}, {url}")
    return None

def fetch_pic_urls(url):
    if url.startswith('http'):
        response = get_response(url)
        html_content = response.text
    # 使用BeautifulSoup解析HTML
    soup = BeautifulSoup(html_content, 'html.parser')

    # 查找所有包含data-xkrkllgl属性的img标签
    img_tags = soup.find_all('img', attrs={'data-xkrkllgl': True})

    # 提取并返回data-xkrkllgl属性的值
    pic_urls = [img['data-xkrkllgl'] for img in img_tags]
    return pic_urls



def decrypt_image_url(encrypted_url, key, iv):
    # 将key和iv转换为字节串
    key = bytes.fromhex(key)
    iv = bytes.fromhex(iv)

    # 将加密的URL从base64解码
    encrypted_data = base64.b64decode(encrypted_url)

    # 创建一个AES的CBC模式的解密器
    cipher = AES.new(key, AES.MODE_CBC, iv)

    # 解密数据
    decrypted_padded = cipher.decrypt(encrypted_data)

    # 去除填充
    decrypted_data = unpad(decrypted_padded, AES.block_size)

    # 将解密后的数据转换为字符串
    decrypted_url = decrypted_data.decode('utf-8')

    return decrypted_url

def download_image(url, save_dir='.', timeout=10):
    response = get_response(url, timeout)
    if response and response.status_code == 200:
        # 从URL中提取文件名
        filename = os.path.basename(url)
        # 确保保存目录存在
        if not os.path.exists(save_dir):
            os.makedirs(save_dir)
        # 拼接完整的文件路径
        file_path = os.path.join(save_dir, filename)
        with open(file_path, 'wb') as f:
            f.write(response.content)
        print(f'Image downloaded successfully to {file_path}.')
    else:
        print('Failed to download image.')


def main():
    key = 'f5d965df75336270'
    iv = '97b60394abc2fbe1'
    url = 'xxxx'
    encrypted_urls = fetch_pic_urls(url)

    for encrypted_url in encrypted_urls:
        decrypted_url = decrypt_image_url(encrypted_url, key, iv)
        download_image(decrypted_url)


if __name__ == '__main__':
   main()

十万菠萝拍黄瓜 · 发表于 2024-8-17 09:20

本帖最后由十万菠萝拍黄瓜于 2024-8-17 09:22 编辑

单张图的例子, 改一下就行

[Python] 纯文本查看 复制代码

from Crypto.Cipher import AES
from Crypto.Util.Padding import unpad
import base64
from io import BytesIO
import requests

def decrypt_image(encrypted_base64):
    key = b"f5d965df75336270"
    iv = b"97b60394abc2fbe1"
    encrypted_data = base64.b64decode(encrypted_base64)
    cipher = AES.new(key, AES.MODE_CBC, iv)
    decrypted_padded = cipher.decrypt(encrypted_data)
    decrypted_data = unpad(decrypted_padded, AES.block_size)
    return decrypted_data

def ab2b64(t):
    binary_data = BytesIO(t)
    data = binary_data.read()
    b64encoded = base64.b64encode(data)
    return b64encoded

def main():
    url = 'https://pic.uzsofv.cn/upload_01/xiao/20240816/2024081616535950939.jpeg'
    res = requests.get(url).content
    b64 = ab2b64(res)
    s = decrypt_image(b64)
    with open('1.jpg', 'wb') as f:
        f.write(s)
    print('111')


if __name__ == '__main__':
    main()

wasm2023 · 发表于 2024-8-17 09:47

如果是wasm生成的图片，并且元素里只有一个canvasid，请问如何去定位生成位置呢

puz_zle · 发表于 2024-8-17 12:04

wasm2023 发表于 2024-8-17 09:47
如果是wasm生成的图片，并且元素里只有一个canvasid，请问如何去定位生成位置呢

分析接口比这个省事

wasm2023 · 发表于 2024-8-17 14:34

puz_zle 发表于 2024-8-17 12:04
分析接口比这个省事

没找到接口

帐号		自动登录	找回密码
密码			注册[Register]

[求助] 新手求助，data:image 这种动态加载的图片如何下载？

免费评分