將 comm 與 NULL 終止的記錄一起使用

December 27, 2019

在對另一個問題的回答中，我想使用類似這樣的結構來查找出現在list2其中但未出現在的文件list1：
( cd dir1 && find . -type f -print0 ) | sort -z &gt; list1
( cd dir2 && find . -type f -print0 ) | sort -z &gt; list2
comm -13 list1 list2
但是，我碰壁了，因為我的版本comm無法處理以 NULL 結尾的記錄。（一些背景：我將一個計算列表傳遞給rm，所以我特別希望能夠處理可能包含嵌入換行符的文件名。）
如果你想要一個簡單的例子，試試這個
mkdir dir1 dir2
touch dir1/{a,b,c} dir2/{a,c,d}
( cd dir1 && find . -type f ) | sort &gt; list1
( cd dir2 && find . -type f ) | sort &gt; list2
comm -13 list1 list2
如果沒有以 NULL 結尾的行，此處的輸出是./d僅出現在list2.
我希望能夠用來find ... -print0 | sort -z生成列表。
我怎樣才能最好地重新實現一個等效於輸出出現在但沒有出現在中comm的以 NULL 結尾的記錄的等價物？list2``list1

GNU comm（從 GNU coreutils 8.25 開始）現在有一個-z/--zero-terminated選項。
對於舊版本的 GNU comm，您應該能夠交換 NUL 和 NL：
comm -13 &lt;(cd dir1 && find . -type f -print0 | tr '\n\0' '\0\n' | sort) \
        &lt;(cd dir2 && find . -type f -print0 | tr '\n\0' '\0\n' | sort) |
 tr '\n\0' '\0\n'
這種方式comm仍然適用於換行符分隔的記錄，但輸入中的實際換行符編碼為 NUL，因此我們仍然可以安全地使用包含換行符的文件名。
您可能還想將語言環境設置為，C因為至少在 GNU 系統和大多數 UTF-8 語言環境上，有不同的字元串排序相同，會在這裡引起問題¹。
這是一個非常常見的技巧（參見Invert matching lines, NUL-separated for another example with comm），但需要在輸入中支持 NUL 的實用程序，這在 GNU 系統之外是相對罕見的。
¹ 範例：
$ touch dir1/{①,②} dir2/{②,③}
$ comm -12 &lt;(cd dir1 && find . -type f -print0 | tr '\n\0' '\0\n' | sort) \
          &lt;(cd dir2 && find . -type f -print0 | tr '\n\0' '\0\n' | sort)  
./③
./②
$ (export LC_ALL=C
   comm -12 &lt;(cd dir1 && find . -type f -print0 | tr '\n\0' '\0\n' | sort) \
            &lt;(cd dir2 && find . -type f -print0 | tr '\n\0' '\0\n' | sort))
./②
（2019 年編輯：①②③ 的相對順序已在較新版本的 GNU libc 中修復，但您可以使用 🧙 🧚 🧛 代替，例如在 95% 的 Unicode 程式碼點仍然存在問題的較新版本（至少 2.30）中)

引用自：https://unix.stackexchange.com/questions/446939

將 comm 與 NULL 終止的記錄一起使用

相關問答

從多個文件中查找特定字元串之後的最高數字

使用 find -newer 多次處理文件

通過管道到“sort”命令對“find -print0”的輸出進行排序

如何以遞歸方式獲取具有特定副檔名的最新文件列表並將它們複製到文件夾中

如何在文件夾內的多個文件中找到重複的行

對 find -exec ls 的輸出進行排序