Text-Processing

從逗號分隔的文本中提取列

  • July 18, 2015

我有一個以逗號分隔的長文件,有 20K 行。這是一個範例:

"","id","number1","number2","number3","number4","number5","number6","number7"
"1","MRTAT_1of3.RTS",17.1464602742708,17.1796255746079,17.1132949739337,0.996138996138996,-0.0055810322632996,1,1
"2","MRTAT_2of3.RTS",3.88270908946253,6.13558056235995,1.62983761656512,0.265637065637066,-1.91247162787182,0.718084341158075,1
"3","MRTAT_3of3.RTS",3.87323328936623,1.22711611247199,6.51935046626046,5.31274131274131,2.40945646701554,0.676814519398334,1

我想列印具有 id、number4、number5 和 number 6 的列,並使用製表符分隔設置條件 number4 大於 4.0。這是一些範例輸出:

id         number4           number5           number6
MRTAT_3of3.RTS 5.31274131274131  2.40945646701554  0.676814519398334
awk -F , -v OFS='\t' 'NR == 1 || $6 > 4 {print $1, $6, $7, $8}' input.txt

我同意 awk 是最好的解決方案。您可以使用其他一些工具在 bash 中執行此操作:

cut -d , -f 2,6,7,8 filename | {
   read header
   tr , $'\t' <<< "$header"
   while IFS=, read -r id num4 num5 num6; do
       # bash can only do integer arithmetic
       if [[ $(bc <<< "$num4 >= 4.0") = 1 ]]; then
          printf "%s\t%s\t%s\t%s\n" "$id" "$num4" "$num5" "$num6"
       fi
   done
}

引用自:https://unix.stackexchange.com/questions/49027