Ubuntu

根據其他 2 列的值在 CSV 文件中創建新列

  • September 1, 2022

我有一個包含 23 列的 csv,其中包含來自網路掃描的數據。我需要根據最後兩列(22 和 23)的數據創建一個新列。我想要的輸出如下:

新列標題 = 已標記

if column 22 = Malicious and column 23= C&C-FileDownload then new column 24= 1

有人可以幫助我使用 Ubuntu 實現這一目標嗎?我一直在研究這個,我可以看到 awk 是使用的工具,但我對此很陌生。

到目前為止,我已經嘗試過:awk 'NR==1{$24="merge";print;next} \ $22 == "Malicious" || $23 == "C&C-FileDownload" {$24=1}1' Malware-44-1.csv > test1.csv但它沒有添加帶有“1”的新列,它確實添加了“Merged”作為列,但沒有用逗號分隔它。

我正在使用以下輸出:

awk -F, -v OFS=',' 'NR==1{ $24="merge"; print; next }
{ $24=($22 == "Malicious" && $23 == "C&C-FileDownload") }1
' master.csv > output1.csv

現在看起來像這樣:

ts,uid,id.orig_h,id.orig_p,id.resp_h,id.resp_p,proto,service,duration,orig_bytes,resp_bytes,conn_state,local_orig,local_resp,missed_bytes,history,orig_pkts,orig_ip_bytes,resp_pkts,resp_ip_bytes,tunnel_parents,label,detailed-label
,merge
1547150789.067208,CzsY0D4B96NTr8m7ld,192.168.1.199,59222,46.101.251.172,80,tcp,http,1.686784,149,171750,SF,-,-,11584,ShADadttfF,122,7741,122,178102,-,Malicious,C&C-FileDownload
,0

這現在幾乎可以工作了,但只顯示列表中的最後一個條件:

awk -F, -v OFS=, 'NR==1{ $24="label1"; print; next }
{ $24=($22 == "Malicious" && $23 == "C&C")?0:"" }
{ $24=($22 == "Malicious" && $23 == "C&C-FileDownload")?1:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-HeartBeat")?2:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-HeartBeat-Attack")?3:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-HeartBeat-FileDownload")?4:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-Mirai")?5:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-Torii")?6:"" } 
{ $24=($22 == "Malicious" && $23 == "DDoS")?7:"" }
{ $24=($22 == "Malicious" && $23 == "FileDownload")?8:"" } 
{ $24=($22 == "Malicious" && $23 == "Okiru")?9:"" } 
{ $24=($22 == "Malicious" && $23 == "Okiru-Attack")?10:"" } 
{ $24=($22 == "Malicious" && $23 == "PartOfAHorizontalPortScan")?11:"" } 
{ $24=($22 == "Malicious" && $23 == "PartOfAHorizontalPortScan-Attack")?12:"" } 
{ $24=($22 == "Malicious" && $23 == "C&C-PartOfAHorizontalPortScan")?13:"" } 
{ $24=($22 == "Malicious" && $23 == "Attack")?14:"" } 
{ $24=($22 == "Benign" && $23 == "-")?15:"" } 
1' master.csv > masteroutput1.csv

當我遇到語法錯誤時,我刪除了 "" 後面的括號。

您需要告訴 awk 輸入欄位分隔符是什麼。-F,我們告訴它是一個逗號字元。您還需要告訴輸出欄位分隔符是什麼。我們指定了-v OFS=,也應該是一個逗號字元。

awk -F, -v OFS=, 'NR==1{ $24="merge"; print; next }
{ $24=($22 == "Malicious" && $23 == "C&C-FileDownload") }1
' Malware-44-1.csv > output.csv

我還更新了命令,如果條件不滿足,則 column#24 將為 0,否則為 1,因此所有記錄將具有相同數量的列;

如果您想將這些列留空而不是用 0 填充,那麼:

awk -F, -v OFS=, 'NR==1{ $24="merge"; print; next }
{ $24=($22 == "Malicious" && $23 == "C&C-FileDownload")?1:"") }1
' Malware-44-1.csv > output.csv

要定義多個規則,請執行以下操作:

awk -F, -v OFS=, 'NR==1{ $24="merge"; print; next }
{ $24=($22 == "Malicious" && $23 == "C&C-FileDownload")?1:"") }
{ $24=( .... ) }
{ $24=( .... ) }
{ # and some more ... }
1' Malware-44-1.csv > output.csv

或者您也可以在列印目前記錄後單獨列印:

awk 'NR==1{ print $0 ",merge" }
NR>1{ print $0 "," ($22 == "Malicious" && $23 == "C&C-FileDownload")?1:"") }
' Malware-44-1.csv > output.csv

引用自:https://unix.stackexchange.com/questions/715743