如果欄位 1 匹配並且欄位 3 中的日期/時間距離第一個欄位 1 匹配的時間少於 5 分鐘，則過濾要刪除的 CSV 文件

March 8, 2021

在 Raspberry Pi 上的 Ubuntu 上使用bashshell 腳本，我試圖從（逗號分隔的）CSV 列表中刪除行，其中 {欄位1 匹配**且欄位 3 距離欄位 1 的第一次匹配不到 5 分鐘（300 秒） } .
這是一個範例輸入文件。我已經用 # 註釋了所需的輸出，以解釋為什麼保留或刪除一行。我想要的不是註釋，而只是刪除說“刪除”的行。實際的輸入和過濾後的輸出文件將是這樣的：
A11EEA,@N171WT,2021/03/06 12:37:25,700,0.1
A0FC0A,@N1624K,2021/03/06 13:37:33,1975,2.0
...et cetera
帶有註釋的所需輸出的輸入文件：
A11EEA,@N171WT,2021/03/06 12:37:25,700,0.1     # Keep - 1st occurrence of Field-1
A0FC0A,@N1624K,2021/03/06 13:37:33,1975,2.2    # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:45:43,4500,1.3   # Keep - 1st occurrence of Field-1
A55325,@N442MG,2021/03/06 15:28:06,600,0.4     # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:50:46,4500,1.5   # Keep - more than 5 mins from line 3
AB0ED6,@UAL1470,2021/03/06 13:51:23,4925,1.6   # Keep - 1st occurrence of Field-1
AB8C37,@AAL2386,2021/03/06 13:52:48,4500,1.7   # Delete - less than than 5 mins from line 5
AB0ED6,@UAL1470,2021/03/06 13:56:30,4925,1.8   # Keep - more than 5 mins from line 6
AB0ED6,@UAL1470,2021/03/06 13:56:40,4925,1.9   # Delete - less than than 5 mins from line 8
AB8C37,@AAL2386,2021/03/06 13:56:49,4500,1.0   # Delete - less than than 5 mins from line 5**

** Line 7 of the original record is not considered because it is slated for deletion
理想情況下，我想要一個使用 awk/sed/sort/uniq 而不是遞歸地做這樣的事情的解決方案：
while IFS= read -r line
do
  IFS=, read -ra record &lt;&lt;&lt; "$line"
  # ... do a bunch of stuff
done &lt; "inputfile.csv"
我試過這個，awk但由於任務的複雜性和潛在的遞歸，我很快就卡住了。
幫助？請問好看嗎？

您可以創建一個函式awk來獲取兩個日期之間的秒數差異，然後您只需將最後一個“有效”日期儲存在awk由第一個欄位索引的數組中，這樣您就可以在比較中使用它，例如：
awk '
function getDateDifference(a,b) {
   gsub(/[:/]/, " ", a)
   startDate = mktime(a)
   gsub(/[:/]/, " ", b)
   endDate = mktime(b)
   return int(endDate - startDate)   
}

BEGIN { FS=OFS="," } 

dates[$1]=="" || (dates[$1]!="" && getDateDifference(dates [$1],$3) &gt; 300){
   print $0;
   dates[$1] = $3
}' input.txt
請注意，在進行日期之間的比較之前，您必須檢查特定第一個欄位的索引數組值是否存在，以確保列印第一個匹配項。

引用自：https://unix.stackexchange.com/questions/637970

如果欄位 1 匹配並且欄位 3 中的日期/時間距離第一個欄位 1 匹配的時間少於 5 分鐘，則過濾要刪除的 CSV 文件

相關問答

無法對文件值進行 awk 循環

如何每 60 秒從文件中複製行，並從上次複製的內容繼續

如何在 bash 中的 awk / sed 命令之間添加製表符空間

awk 不換行

從 rsync 輸出中使用“對話框”製作進度條

如何按文件副檔名將文件目錄拆分為命名的子目錄？