Text-Processing

多列日誌文件的後處理

  • December 3, 2020

我正在對多列日誌填充進行後處理,格式如下:

/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,6, -5.3300, 201.2781, 0,,  26,  8, 1, -0.2132
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,5, -5.2300, 230.0910, 0,,  26,  8, 1, -0.2092
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -5.1500, 222.2095, 0,,  26,  8, 1, -0.2060
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,5, -5.0500, 201.1757, 0,,  26,  8, 1, -0.2020
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,2, -5.0200, 233.0833, 0,,  26,  8, 1, -0.2008
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,5, -4.9500, 203.5671, 0,,  26,  8, 1, -0.1980
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,4, -4.9500, 227.0462, 0,,  26,  8, 1, -0.1980
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,14, -4.7700, 231.9237, 0,,  26,  8, 1, -0.1908
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,5, -4.7200, 194.9009, 0,,  26,  8, 1, -0.1888
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.6700, 217.3995, 0,,  26,  8, 1, -0.1868
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.6400, 200.7227, 0,,  26,  8, 1, -0.1856
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.5900, 184.7898, 0,,  26,  8, 1, -0.1836
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,3, -4.5500, 215.7487, 0,,  26,  8, 1, -0.1820
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,3, -4.4500, 198.2857, 0,,  26,  8, 1, -0.1780
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_07_lig_cne_420,1, -4.4200, 204.6418, 0,,  26,  8, 1, -0.1768
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,6, -4.3700, 199.5359, 0,,  26,  8, 1, -0.1748
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,6, -4.3500, 232.3248, 0,,  26,  8, 1, -0.1740
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,3, -4.2700, 234.3468, 0,,  26,  8, 1, -0.1708
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -4.2500, 195.9439, 0,,  26,  8, 1, -0.1700
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,7, -4.2400, 198.9363, 0,,  26,  8, 1, -0.1696
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_03_lig_cne_420,1, -4.1600, 208.6377, 0,,  26,  8, 1, -0.1664
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_01_lig_cne_420,3, -4.1500, 179.4341, 0,,  26,  8, 1, -0.1660
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_12_lig_cne_420,4, -4.1300, 233.9607, 0,,  26,  8, 1, -0.1652
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -4.1200, 189.5660, 0,,  26,  8, 1, -0.1648
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,1, -4.1100, 209.8679, 0,,  26,  8, 1, -0.1644
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,5, -4.1000, 213.5573, 0,,  26,  8, 1, -0.1640
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_08_lig_cne_420,1, -4.0700, 227.6124, 0,,  26,  8, 1, -0.1628
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,3, -4.0400, 209.6345, 0,,  26,  8, 1, -0.1616
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_06_lig_cne_420,4, -3.9700, 233.5914, 0,,  26,  8, 1, -0.1588
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,4, -3.9500, 223.9189, 0,,  26,  8, 1, -0.1580
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_05_lig_cne_420,1, -3.9000, 180.8133, 0,,  26,  8, 1, -0.1560
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_11_lig_cne_420,1, -3.9000, 224.1828, 0,,  26,  8, 1, -0.1560
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_02_lig_cne_420,1, -3.8800, 204.1735, 0,,  26,  8, 1, -0.1552
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_09_lig_cne_420,1, -3.8500, 195.5399, 0,,  26,  8, 1, -0.1540
/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/7000_cne_lig420.AllBoxes/7000_10_lig_cne_420,2, -3.8400, 227.9037, 0,,  26,  8, 1, -0.1536

請注意,第 1 列和第 2 列由逗號 (,) 分隔,而其餘列由逗號空格 (, ) 分隔。從這個日誌文件我需要:

  1. /Users/gleb/Desktop/scripts/...用相應的行號(僅行 N)替換第一列(長 unix 格式路徑)中的所有數據;
  2. 徹底刪除第 6-9 列(最後四列);

最終生成的日誌應該包含相同數量的行,但僅從第 1 列(有替換!)到第 5 列(最後一列有0,)。

我已經能夠完成的是使用 sed 在第一列中進行替換,但是它只是切斷了路徑,但沒有在那裡引入相應的行號:

sed -i '' -e 's|\/Users/gleb/Desktop/scripts/analys_clusters/sub_folders_to_analyse/*.*/||' log.txt
gawk -F'^[^,]*,|, ' '{ print NR, $2, $3, $4, $5; }' OFS=', ' infile

跳過前N行,添加NR> Nawk,因此將跳過前*N行;*跳過第一行,你會這樣做:

gawk -F'^[^,]*,|, ' 'NR> 1{ print NR, $2, $3, $4, $5; }' OFS=', ' infile

隨後您將需要修改NRNR-1,因此它將從1而不是2開始,或者只是將其替換為另一個臨時變數,例如:

gawk -F'^[^,]*,|, ' 'NR> 1{ print ++lineNumber, $2, $3, $4, $5; }' OFS=', ' infile

^[^,]*,匹配從行首到第一個逗號字元;

, 匹配逗號空格字元。

上面這些我們定義為欄位分隔符(用 分隔|),並在此基礎上列印相應的欄位;NRawk中表示目前行號。


另一種選擇是使用cutand nl

<infile cut -d',' -f2-6 |nl -w1 -s', '

cut命令剪切欄位 2~6 並nl用逗號分隔的行編號,-w將 1 個寬度列設置為數字。

引用自:https://unix.stackexchange.com/questions/622727