Scripting

從 srt 文件中去除換行符和換行符

  • November 12, 2020

我使用此腳本從字幕中刪除時間戳。

awk '/-->/{for(i=1;i<d;i++){print a[i]};delete a;d=0;next}{a[++d]=$0}
   END{for(i in a)print a[i]}' xxxxx.srt > xxx.txt

然後,我將結果粘貼到刪除換行符和分段符的網頁中。只有一個段落,它是一個空格而不是中斷。去過那裡: https ://www.textfixer.com/tools/remove-line-breaks.php

我一直在尋找一種解決方案,將所有這些都集成到一個命令中,但我找不到如何去做。我知道除了 awk 之外還有其他選擇,任何可以從 mac 終端輕鬆完成此任務的東西都對我有用!

請幫忙?

這是我要格式化的範例字幕,但它不起作用。我看到有些正在工作……這很奇怪。

字幕文件

預期輸出:

Welcome to our program! This month’s theme is “Are You Paying Attention?” Strained relationships, illnesses, careers, entertainment —we’ll learn how to stay focused on Jehovah despite these potential distractions. We’ll see how our ministry is more effective when we focus on reaching the hearts of people. And our new song was written especially for you young adults to help you keep your eyes on the prize of life.

但這是我從你的腳本中得到的:

   Welcome to our program!

2
00:00:06,089 --> 00:00:08,624
This month’s theme is

3
00:00:08,625 --> 00:00:11,126
“Are You Paying Attention?”

4
00:00:11,127 --> 00:00:13,595
Strained relationships,

5
00:00:13,596 --> 00:00:16,131
illnesses,

awk在“段落模式”下使用:

awk -v RS= '{
   for (i=5;i<=NF;i++){
     printf "%s%s", (sep ? " " : ""), $i
     sep=1
   }
 }
 END{ print "" }
' file.srt > file.txt

這會將記錄分隔符設置為空字元串,並且記錄由空行分隔。跳過每條記錄的前四個欄位(欄位 1 是行號,欄位 2-4 是顯示時間),並且除了第一個欄位外,其他欄位都以空格字元為前綴列印。

最後,列印一個換行符。

輸入文件:

1
00:00:06,453 --> 00:00:10,579
When one chooses to walk
the Way of the Mandalore,

2
00:00:10,581 --> 00:00:14,095
you are both hunter and prey.

3
00:00:17,935 --> 00:00:20,076
There is one job.

4
00:00:20,078 --> 00:00:21,945
Underworld?

5
00:00:21,947 --> 00:00:26,118
How uncharacteristic of
one of your reputation.

輸出:

When one chooses to walk the Way of the Mandalore, you are both hunter and prey. There is one job. Underworld? How uncharacteristic of one of your reputation.

引用自:https://unix.stackexchange.com/questions/617375