吾爱破解 - LCG - LSG |安卓破解|病毒分析|www.52pojie.cn

 找回密码
 注册[Register]

QQ登录

只需一步,快速开始

查看: 2443|回复: 62
收起左侧

[原创工具] 一个简单的针对扫描版pdf压缩的软件

  [复制链接]
duskdust 发表于 2024-3-23 19:48
初学py,想写个小文件方便自己对扫描版文字类pdf阅读
  • 使用pyinstaller对pypdf中减小pdf的功能进行打包,参见https://pypdf.readthedocs.io/en/stable/user/file-size.html
  • 将exe放置在pdf同目录文件夹下操作,有移除重复对象,压缩图片,二值化压缩,针对扫描版pdf的图像二值化压缩效果比较好
  • 直接可用的exe链接 https://penguin-a.lanzoue.com/ibnD11scx99i
[Python] 纯文本查看 复制代码
"""
压缩pdf, 保留标签
"""
import os
from pypdf import PdfReader, PdfWriter
from tqdm import tqdm
from PIL import Image, ImageEnhance
from io import BytesIO

def get_page_number_from_indirect(reader, indirect_ref):
    for i, page in enumerate(reader.pages):
        if page.indirect_ref == indirect_ref:
            return i
    return None

def add_bookmarks_to_writer(writer, reader, outlines):
    for item in outlines:
        if isinstance(item, list):
            # 如果书签有子项(嵌套的书签),递归处理
            add_bookmarks_to_writer(writer, reader, item)
        else:
            title = item.get('/Title')
            indirect_ref = item.get('/Page')
            page_num = get_page_number_from_indirect(reader, indirect_ref)
            if page_num is not None:
                writer.add_outline_item(title, page_num)

def blacky(im):
    #调整对比度
    im = im.convert('L')
    im = ImageEnhance.Contrast(im).enhance(3)
    #调整高亮度
    im = ImageEnhance.Brightness(im).enhance(1.5)
    #调整锐化
    im = ImageEnhance.Sharpness(im).enhance(2)
    #调整饱和度
    im = ImageEnhance.Color(im).enhance(1.5)

    #定义灰度界限
    threshold = 128
    table =  []
    for i in range(256):
        if i < threshold:
            table.append(0)
        else:
            table.append(1)
    new_image =  im.point(table, '1')
    #convert tiff image compression to ccitt t.6 
    imgbuffer =  BytesIO()
    new_image.save(imgbuffer,format="TIFF",compression='group4', optimize=True,dpi=[300, 300])
    return Image.open(imgbuffer)

# List all PDF files in the current directory
pdf_files = [f for f in os.listdir('.') if f.endswith('.pdf')]
for idx, file in enumerate(pdf_files):
    print(f"{idx}: {file}")

# User selects a PDF file
file_index = int(input("输入要压缩的pdf文件序号: "))
pdf_file = pdf_files[file_index]

# Options for reducing file size
print("选择压缩pdf的方式")
print("1: 删除重复对象")
print("2: 删除图像")
print("3: 降低图片质量")
print("4: 使用无损压缩")
print("5: 二值化压缩为tif")
choice = int(input("输入选择 (1-5): "))

reader = PdfReader(pdf_file)
writer = PdfWriter()

for page in tqdm(reader.pages, desc="读取页面"):
    writer.add_page(page)

# Apply the chosen method
if choice == 1:
    writer.add_metadata(reader.metadata)
    pass
elif choice == 2:
    writer.remove_images()
    pass
elif choice == 3:
    for page in tqdm(writer.pages,desc="压缩图像"):
        for img in page.images:
            img.replace(img.image, quality=80)
    pass

elif choice == 4:
    # Apply lossless compression code here

    for page in tqdm(writer.pages,desc="写入页面"):
        # &#9888;&#65039; This has to be done on the writer, not the reader!
        page.compress_content_streams()  # This is CPU intensive!
    pass

elif choice == 5:
    for page in tqdm(writer.pages,desc="写入页面"):
        for img in page.images:
            img.replace(blacky(img.image))

# Write the output file
print(f"写入书签中")
outlines = reader.outline
add_bookmarks_to_writer(writer, reader, outlines)


output_file = "reduced_" + pdf_file
with open(output_file, "wb") as f:
    writer.write(f)

print(f"处理后的文件为 {output_file}")

压缩效果较为明显

压缩效果较为明显

压缩后图片

压缩后图片

免费评分

参与人数 5吾爱币 +11 热心值 +3 收起 理由
bfkeyi + 1 我很赞同!
amirfly + 1 我很赞同!
Lussering + 1 + 1 我很赞同!
zcyp0314 + 1 + 1 谢谢@Thanks!
风之暇想 + 7 + 1 感谢发布原创作品,吾爱破解论坛因你更精彩!

查看全部评分

发帖前要善用论坛搜索功能,那里可能会有你要找的答案或者已经有人发布过相同内容了,请勿重复发帖。

ianlcc 发表于 2024-3-29 11:25
duskdust 发表于 2024-3-29 10:19
pyinstaller打包的时候会有对应的错误吗?

不知道是不是我哪里没设置好…
D:\111\0612\123\0>Pyinstaller -F -w pdf2small.py
505 INFO: PyInstaller: 6.3.0
505 INFO: Python: 3.11.0
523 INFO: Platform: Windows-10-10.0.19045-SP0
525 INFO: wrote D:\111\0612\123\0\pdf2small.spec
531 INFO: Extending PYTHONPATH with paths
['D:\\111\\0612\\123\\0']
1034 INFO: checking Analysis
1034 INFO: Building Analysis because Analysis-00.toc is non existent
1034 INFO: Initializing module dependency graph...
1037 INFO: Caching module graph hooks...
1062 INFO: Analyzing base_library.zip ...
3199 INFO: Loading module hook 'hook-heapq.py' from 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\PyInstaller\\hooks'...
3315 INFO: Loading module hook 'hook-encodings.py' from 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\PyInstaller\\hooks'...
5663 INFO: Loading module hook 'hook-pickle.py' from 'C:\\Users\\Administrator\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\PyInstaller\\hooks'...
7709 INFO: Caching module dependency graph...
7869 INFO: Running Analysis Analysis-00.toc
7869 INFO: Looking for Python shared library...
7901 INFO: Using Python shared library: C:\Users\Administrator\AppData\Local\Programs\Python\Python311\python311.dll
7901 INFO: Analyzing D:\111\0612\123\0\pdf2small.py
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Scripts\pyinstaller.exe\__main__.py", line 7, in <module>
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\__main__.py", line 214, in _console_script_run
    run()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\__main__.py", line 198, in run
    run_build(pyi_config, spec_file, **vars(args))
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\__main__.py", line 69, in run_build
    PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\building\build_main.py", line 1071, in main
    build(specfile, distpath, workpath, clean_build)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\building\build_main.py", line 1011, in build
    exec(code, spec_namespace)
  File "D:\111\0612\123\0\pdf2small.spec", line 4, in <module>
    a = Analysis(
        ^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\building\build_main.py", line 470, in __init__
    self.__postinit__()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\building\datastruct.py", line 184, in __postinit__
    self.assemble()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\building\build_main.py", line 608, in assemble
    priority_scripts.append(self.graph.add_script(script))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\depend\analysis.py", line 268, in add_script
    self._top_script_node = super().add_script(pathname)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python311\Lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py", line 1153, in add_script
    contents = importlib.util.decode_source(contents)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 770, in decode_source
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 5: invalid start byte
bfkeyi 发表于 2024-4-5 16:48
XnView Class,这是一个看图软件,用这个软件压缩图片选50~60%质量,压缩出来的照片和原图差不多,有什么办法批量让pdf文档通过XnView Class软件压缩图再生成pdf吗。
我把一个500m的彩色pdf导出图片(用的是PDF-Xchange Editor),再用XnView Class压缩图片60%,再合成pdf,最后文件100m,清晰度差不多。
D:\Desktop\Snipaste_2024-04-05_16-44-46.jpg
why110609 发表于 2024-3-24 18:24
jori 发表于 2024-3-25 00:09
我这64位win7打开程序怎么一闪就没自动关闭
hwiori 发表于 2024-3-25 00:28
马克一下,感谢分享
sxzswx 发表于 2024-3-25 05:05
jori 发表于 2024-3-25 00:09
我这64位win7打开程序怎么一闪就没自动关闭

升级WIN10才是王道
lx5012012 发表于 2024-3-25 06:44
对于以前旧的文件比较有用
David1000 发表于 2024-3-25 07:49
压缩效果不错
qxpqxz 发表于 2024-3-25 07:57
感谢楼主分享
qdllss 发表于 2024-3-25 08:29
感谢感谢
LXWY2K 发表于 2024-3-25 09:49
效果真不错!!!
您需要登录后才可以回帖 登录 | 注册[Register]

本版积分规则 提醒:禁止复制他人回复等『恶意灌水』行为,违者重罚!

快速回复 收藏帖子 返回列表 搜索

RSS订阅|小黑屋|处罚记录|联系我们|吾爱破解 - LCG - LSG ( 京ICP备16042023号 | 京公网安备 11010502030087号 )

GMT+8, 2024-4-29 05:39

Powered by Discuz!

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表