SRT字幕批量净化时间戳转成TXT

wesley1224 · 发表于 2024-1-5 17:45

本帖最后由 wesley1224 于 2024-3-28 08:40 编辑

非常感谢支持！
上个贴多谢朋友喜欢！再接着加油！

TXT文本提取工具
https://www.52pojie.cn/thread-1876964-1-1.html
(出处: 吾爱破解论坛)

在这个贴，有个坛友留言
回复.jpg

想要批量srt净化成txt 那么来了！刚下午有空研究了下，送给用得上的朋友！

目前在自学python中，自写的练手的小工具，语言：python.
修正下目前是能处理文件编码是utf-8的srt文件和常用编码，如果遇到srt没处理成功的，请打开转换下srt 文件的编码为utf-8.再用这个处理

功能：把srt 字幕文件去掉时间戳保留文字净化成txt，就很纯粹！
srt文件.jpg

解压密码：52pojie

20240108-更新1.0

1.遍历目录下所有文件夹的srt文件，去掉时间戳生成纯文本在对应的文件夹下。

20240114-更新2.0

优化检测文件编码逻辑，增加提示。

20240226 更新3.0

去掉最后输出txt文本的空行

https://wwch.lanzoul.com/b00s26jri
密码:77pe

wesley1224 · 发表于 2024-1-5 17:53

本帖最后由 wesley1224 于 2024-3-28 08:40 编辑

20240226 更新3.0

去掉最后输出txt文本的空行

https://wwch.lanzoul.com/b00s26jri
密码:77pe

聊无知己 · 发表于 2024-1-5 18:06

那有没有倒过来批量添加时间戳的呢

死月 · 发表于 2024-1-6 16:18

本帖最后由死月于 2024-1-6 16:27 编辑

合并的代码

import os
import sys

def merge_srt_files(srt_file_path):
# 提取文件名和目录
directory = os.path.dirname(srt_file_path)
filename = os.path.splitext(os.path.basename(srt_file_path))[0]

# 构建对应的txt文件路径
time_file_path = os.path.join(directory, f"shijian{filename}.txt")
text_file_path = os.path.join(directory, f"yiwen{filename}.txt")

# 检查txt文件是否存在
if not (os.path.isfile(time_file_path) and os.path.isfile(text_file_path)):
      print(f"找不到对应的txt文件：{time_file_path} 或 {text_file_path}")
      return

# 构建合并后的srt文件路径
merged_srt_file_path = os.path.join(directory, f"合并{filename}.srt")

with open(time_file_path, 'r', encoding='utf-8') as f1, \
         open(text_file_path, 'r', encoding='utf-8') as f2, \
         open(merged_srt_file_path, 'w', encoding='utf-8') as f3:
      # 读取两个文件的内容
      time_lines = f1.readlines()
      text_lines = f2.readlines()

      # 合并两个文件的内容
      for i in range(len(time_lines)):
         if i % 2 == 0:  # 处理时间戳行
            f3.write(str((i+2)//2) + '\n')
         else:  # 处理文本行
            f3.write(time_lines.strip() + '\n')
            f3.write(text_lines[(i-1)//2].strip() + '\n')
            if i != len(time_lines) - 1 and (i+1) % 2 == 0:  # 最后一行不需要加上空行
                  f3.write('\n')  # 每个文本行之后加上一个空行，用于分隔时间戳行和文本行

print(f"合并完成，结果保存在：{merged_srt_file_path}")

# 获取拖拽的文件路径
dragged_files = sys.argv[1:]

# 批量处理拖拽的文件
for file_path in dragged_files:
if file_path.lower().endswith('.srt'):
      merge_srt_files(file_path)

和字幕文件放在相同目录不管是拆分还是合并都是拖拽最原始是SRT到 .PY上会自动遍历目录下的文件并合并

拆分会分成3个文件

幸乃小姐.srt
shijian幸乃小姐.txt
yuanwen幸乃小姐.txt

  合并的时候需要另外一个文件
因为我是为了翻译才拆分的
所以合并的话要
yiwen幸乃小姐.txt
改名成这样才能合并

看明白之后你甚至可以无中生把不相关的普通文本和毫无相关的时间轴合并到一起生成一个伪字幕

在合并的基础上还可以
进行双语字幕的合并

import sys
import os

def read_srt(filename):
with open(filename, 'r', encoding='utf-8') as file:
      lines = file.readlines()

subtitles = []
current_sub = []
for line in lines:
      if line.strip() == '':
         if current_sub:
            subtitles.append(current_sub)
            current_sub = []
      else:
         current_sub.append(line.strip())
if current_sub:
      subtitles.append(current_sub)
return subtitles

def merge_subtitles(files):
subtitles_list = [read_srt(file) for file in files]

merged_subs = []
for sub_parts in zip(*subtitles_list):
      merged_sub = [sub_parts[0][0], sub_parts[0][1]]  # 序号和时间码
      for sub in sub_parts:
         merged_sub.append(sub[2])  # 字幕文本
      merged_subs.append(merged_sub)
return merged_subs

def write_merged_srt(merged_subs, output_file):
with open(output_file, 'w', encoding='utf-8') as file:
      for i, sub in enumerate(merged_subs, 1):
         file.write(f'{i}\n')
         file.write(f'{sub[1]}\n')
         for text in sub[2:]:
            file.write(f'{text}\n')
         file.write('\n')

def find_matching_file(dragged_file):
base_name = os.path.basename(dragged_file)
directory = os.path.dirname(dragged_file)
for file in os.listdir(directory):
      if file.startswith('合并') and file.endswith(base_name):
         return os.path.join(directory, file)
return None

def main():
if len(sys.argv) != 2:
      print("请拖拽一个 SRT 文件到该脚本上。")
      return

dragged_file = sys.argv[1]
matching_file = find_matching_file(dragged_file)

if not matching_file:
      print("没有找到匹配的文件来合并。")
      return

output_file = f'双语 {os.path.basename(dragged_file)}'
merged_subs = merge_subtitles([dragged_file, matching_file])
write_merged_srt(merged_subs, output_file)
print(f"合并字幕文件已生成：{output_file}")

if __name__ == "__main__":
main()
也是拖拽原始 SRT到.PY上会遍历刚才合并完的翻译后的文件跟拖拽的文件合并一个新的双语字幕文件出来

justfly99 · 发表于 2024-1-5 18:01

这个可以，源码可以共享一下吗

jr001 · 发表于 2024-1-5 18:23

感谢分享

zhang510141 · 发表于 2024-1-5 18:25

玩不了看起来很高级

sdieedu · 发表于 2024-1-5 18:50

字符串切割 python

塞北的雪 · 发表于 2024-1-5 19:49

[Python] 纯文本查看 复制代码

^\d+\n[0-9:,]+? --> [0-9:,]+?\n

用正则替换为空即可

yaojianjun · 发表于 2024-1-5 20:03

感谢分享

zzgwyyx · 发表于 2024-1-5 20:07

那有没有倒过来批量添加时间戳的呢

帐号		自动登录	找回密码
密码			注册[Register]

[原创工具] SRT字幕批量净化时间戳转成TXT

免费评分

本帖被以下淘专辑推荐:

免费评分

个人中心