Text-Processing
多列日誌文件的後處理
我正在對多列日誌填充進行後處理,格式如下:
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,6, -5.3300, 201.2781, 0,, 26, 8, 1, -0.2132 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,5, -5.2300, 230.0910, 0,, 26, 8, 1, -0.2092 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -5.1500, 222.2095, 0,, 26, 8, 1, -0.2060 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,5, -5.0500, 201.1757, 0,, 26, 8, 1, -0.2020 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,2, -5.0200, 233.0833, 0,, 26, 8, 1, -0.2008 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,5, -4.9500, 203.5671, 0,, 26, 8, 1, -0.1980 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,4, -4.9500, 227.0462, 0,, 26, 8, 1, -0.1980 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,14, -4.7700, 231.9237, 0,, 26, 8, 1, -0.1908 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,5, -4.7200, 194.9009, 0,, 26, 8, 1, -0.1888 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.6700, 217.3995, 0,, 26, 8, 1, -0.1868 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.6400, 200.7227, 0,, 26, 8, 1, -0.1856 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.5900, 184.7898, 0,, 26, 8, 1, -0.1836 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.5500, 215.7487, 0,, 26, 8, 1, -0.1820 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,3, -4.4500, 198.2857, 0,, 26, 8, 1, -0.1780 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.4200, 204.6418, 0,, 26, 8, 1, -0.1768 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,6, -4.3700, 199.5359, 0,, 26, 8, 1, -0.1748 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,6, -4.3500, 232.3248, 0,, 26, 8, 1, -0.1740 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,3, -4.2700, 234.3468, 0,, 26, 8, 1, -0.1708 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -4.2500, 195.9439, 0,, 26, 8, 1, -0.1700 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,7, -4.2400, 198.9363, 0,, 26, 8, 1, -0.1696 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,1, -4.1600, 208.6377, 0,, 26, 8, 1, -0.1664 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_01_lig_cne_420,3, -4.1500, 179.4341, 0,, 26, 8, 1, -0.1660 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -4.1300, 233.9607, 0,, 26, 8, 1, -0.1652 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.1200, 189.5660, 0,, 26, 8, 1, -0.1648 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,1, -4.1100, 209.8679, 0,, 26, 8, 1, -0.1644 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,5, -4.1000, 213.5573, 0,, 26, 8, 1, -0.1640 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,1, -4.0700, 227.6124, 0,, 26, 8, 1, -0.1628 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,3, -4.0400, 209.6345, 0,, 26, 8, 1, -0.1616 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,4, -3.9700, 233.5914, 0,, 26, 8, 1, -0.1588 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,4, -3.9500, 223.9189, 0,, 26, 8, 1, -0.1580 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -3.9000, 180.8133, 0,, 26, 8, 1, -0.1560 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,1, -3.9000, 224.1828, 0,, 26, 8, 1, -0.1560 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_02_lig_cne_420,1, -3.8800, 204.1735, 0,, 26, 8, 1, -0.1552 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -3.8500, 195.5399, 0,, 26, 8, 1, -0.1540 /Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,2, -3.8400, 227.9037, 0,, 26, 8, 1, -0.1536
請注意,第 1 列和第 2 列由逗號 (,) 分隔,而其餘列由逗號空格 (, ) 分隔。從這個日誌文件我需要:
/Users/gleb/Desktop/scripts/...
用相應的行號(僅行 N)替換第一列(長 unix 格式路徑)中的所有數據;- 徹底刪除第 6-9 列(最後四列);
最終生成的日誌應該包含相同數量的行,但僅從第 1 列(有替換!)到第 5 列(最後一列有
0,
)。我已經能夠完成的是使用 sed 在第一列中進行替換,但是它只是切斷了路徑,但沒有在那裡引入相應的行號:
sed -i '' -e 's|\/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/*.*/||' log.txt
gawk -F'^[^,]*,|, ' '{ print NR, $2, $3, $4, $5; }' OFS=', ' infile
跳過前N行,添加
NR> N
到awk,因此將跳過前*N行;*跳過第一行,你會這樣做:gawk -F'^[^,]*,|, ' 'NR> 1{ print NR, $2, $3, $4, $5; }' OFS=', ' infile
隨後您將需要修改
NR
為NR-1
,因此它將從1而不是2開始,或者只是將其替換為另一個臨時變數,例如:gawk -F'^[^,]*,|, ' 'NR> 1{ print ++lineNumber, $2, $3, $4, $5; }' OFS=', ' infile
^[^,]*,
匹配從行首到第一個逗號字元;
,
匹配逗號空格字元。上面這些我們定義為欄位分隔符(用 分隔
|
),並在此基礎上列印相應的欄位;NR
在awk中表示目前行號。另一種選擇是使用
cut
andnl
:<infile cut -d',' -f2-6 |nl -w1 -s', '
cut
命令剪切欄位 2~6 並nl
用逗號分隔的行編號,
;-w
將 1 個寬度列設置為數字。