在巨型文件上使用帶有多行表達式的 sed 時記憶體不足

March 24, 2022

我目前正在嘗試刪除所有前面沒有右括號的換行符，所以我想出了這個表達式：
sed -r -i -e ":a;N;$!ba;s/([^\)])\n/\1/g;d" reallyBigFile.log
它在較小的文件上完成工作，但在我使用的這個大文件上（3GB），它工作了一段時間，然後返回記憶體不足錯誤：
sed: Couldn't re-allocate memory
有什麼辦法可以在不遇到這個問題的情況下完成這項工作。使用sed本身不是強制性的，我只是想完成它。

您的前三個命令是罪魁禍首：

:a
N
$!ba

這會立即將整個文件讀入記憶體。以下腳本一次只能在記憶體中保留一個段：

% cat test.sed
#!/usr/bin/sed -nf

# Append this line to the hold space. 
# To avoid an extra newline at the start, replace instead of append.
1h
1!H

# If we find a paren at the end...
/)$/{
   # Bring the hold space into the pattern space
   g
   # Remove the newlines
   s/\n//g 
   # Print what we have
   p
   # Delete the hold space
   s/.*//
   h
}
% cat test.in
a
b
c()
d()
e
fghi
j()
% ./test.sed test.in
abc()
d()
efghij()

這個 awk 解決方案將列印每一行，因此它一次在記憶體中只有一行：

% awk '/)$/{print;nl=1;next}{printf "%s",$0;nl=0}END{if(!nl)print ""}' test.in
abc()
d()
efghij()

引用自：https://unix.stackexchange.com/questions/63354

在巨型文件上使用帶有多行表達式的 sed 時記憶體不足

相關問答

大型單行文件上的基本 sed 命令：無法重新分配記憶體

Bash - 如何在不讀取所有內容的情況下將字元串添加到文本文件的開頭？

如何讓我的 sed 腳本執行得更快？

在一個巨大的（70GB）、一行、文本文件中替換字元串

幫助 sed 匹配可選的換行符

在 sed 替換中引用 ${variable} 值