Awk
用於準備 csv 文件的 awk 腳本
我一直在創建一個 awk 腳本,該腳本在分析之前準備一個 csv 文件。我需要創建一個包含 1-2、10、13-15、19-21 列的輸出文件。此外,我需要將第 2 列的數字替換為星期幾(因此,1 = 星期一,2 = 星期二……)並將第 21 列從海裡轉換為公里;並刪除
""
第 10、13 和 14 列。輸入:
"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE", 1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00, 1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00, 1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,
輸出:
"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE" 1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248 1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42 1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61
到目前為止,我已經獲得了獲取所需列的命令:
cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'
以及將第 2 列中的數字替換為各自的星期幾的程式碼:
awk 'BEGIN {FS = OFS = ","} $2 == 1 {$2 = "Monday"} $2 == 2 {$2 = "Tuesday"} $2 == 3 {$2 = "Wednesday"} $2 == 4 {$2 = "Thursday"} $2 == 5 {$2 = "Friday"} $2 == 6 {$2 = "Saturday"} $2 == 7 {$2 = "Sunday"} {print}' file.csv
我還缺少一種將所有程式碼包裝到腳本中以便稍後執行的方法。
#!/bin/awk -f BEGIN { dow[1] = "Monday" dow[2] = "Tuesday" dow[3] = "Wednesday" dow[4] = "Thursday" dow[5] = "Friday" dow[6] = "Saturday" dow[7] = "Sunday" FS=OFS="," } NR == 1 {print $1, $2, $10, $13, $14, $15, $19, $20, $21} NR != 1 { $2 = dow[$2] $21 *= 1.852 gsub(/"/, "", $10) gsub(/"/, "", $13) gsub(/"/, "", $14) print $1, $2, $10, $13, $14, $15, $19, $20, $21 }
將其保存在文件中,例如:
sample.awk
. 使其可執行:chmod +x sample.awk
並以./sample.awk data
.要將輸出保存在另一個文件中,請添加輸出重定向運算符,如下所示:
./sample.awk data > out.csv
awk ' BEGIN { split("Monday Tuesday Wednesday Thursday Friday Saturday Sunday",days) FS=OFS="," } NR > 1 { gsub(/"/,"") $2 = days[$2] $21 *= 1.852 } { print $1, $2, $10, $13, $14, $15, $19, $20, $21 } ' file "DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE" 1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248 1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42 1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61