Linux

如何使用 awk 從列中提取特定程式碼?

  • November 14, 2020

我有一個名為的文本文件final.txt,如下所示:

name_00000001   name_000001 -   u   q1:MSTRG.4|MSTRG.4.1|3|0.000000|0.000000|0.000000|3211
name_00000002   name_000001 -   u   q1:MSTRG.4|MSTRG.4.2|2|0.000000|0.000000|0.000000|894
name_00000003   name_000001 -   p   q1:MSTRG.4|MSTRG.4.3|2|0.000000|0.000000|0.000000|522
name_00000004   name_000002 -   p   q1:MSTRG.26|MSTRG.26.1|1|0.000000|0.000000|0.000000|336
name_00000005   name_000003 -   u   q1:MSTRG.27|MSTRG.27.1|5|0.000000|0.000000|0.000000|730
name_00000006   name_000003 -   k   q1:MSTRG.27|MSTRG.27.2|7|0.000000|0.000000|0.000000|3157
name_00000007   name_000003 -   k   q1:MSTRG.27|MSTRG.27.3|6|0.000000|0.000000|0.000000|3665
name_00000008   name_000003 -   u   q1:MSTRG.27|MSTRG.27.4|4|0.000000|0.000000|0.000000|7900
name_00000009   name_000003 -   u   q1:MSTRG.27|MSTRG.27.5|4|0.000000|0.000000|0.000000|4356
name_00000010   name_000003 -   k   q1:MSTRG.27|MSTRG.27.6|4|0.000000|0.000000|0.000000|1842
name_00000011   name_000003 -   u   q1:MSTRG.27|MSTRG.27.7|3|0.000000|0.000000|0.000000|2752
name_00000012   name_000003 -   p   q1:MSTRG.27|MSTRG.27.8|2|0.000000|0.000000|0.000000|300
name_00000013   name_000003 -   u   q1:MSTRG.27|MSTRG.27.9|2|0.000000|0.000000|0.000000|2895
name_00000014   name_000003 -   k   q1:MSTRG.27|MSTRG.27.10|2|0.000000|0.000000|0.000000|696
name_00000015   name_000003 -   u   q1:MSTRG.27|MSTRG.27.11|4|0.000000|0.000000|0.000000|9046
name_00000016   name_000003 -   u   q1:MSTRG.27|MSTRG.27.12|5|0.000000|0.000000|0.000000|9962
name_00000017   name_000003 -   u   q1:MSTRG.27|MSTRG.27.13|3|0.000000|0.000000|0.000000|17753
name_00000018   name_000003 -   l   q1:MSTRG.27|MSTRG.27.14|2|0.000000|0.000000|0.000000|6895
name_00000019   name_000003 -   l   q1:MSTRG.27|MSTRG.27.15|4|0.000000|0.000000|0.000000|1889
name_00000020   name_000003 -   l   q1:MSTRG.27|MSTRG.27.16|4|0.000000|0.000000|0.000000|4712
name_00000021   name_000003 -   u   q1:MSTRG.27|MSTRG.27.17|3|0.000000|0.000000|0.000000|1154
name_00000022   name_000003 -   u   q1:MSTRG.27|MSTRG.27.18|2|0.000000|0.000000|0.000000|511
name_00000023   name_000003 -   x   q1:MSTRG.27|MSTRG.27.19|3|0.000000|0.000000|0.000000|2984
name_00000024   name_000003 -   u   q1:MSTRG.27|MSTRG.27.20|2|0.000000|0.000000|0.000000|4944
name_00000025   name_000003 -   x   q1:MSTRG.32|MSTRG.32.1|1|0.000000|0.000000|0.000000|279
name_00000026   name_000003 -   x   q1:MSTRG.33|MSTRG.33.1|2|0.000000|0.000000|0.000000|543
name_00000027   name_000003 -   u   q1:MSTRG.34|MSTRG.34.1|2|0.000000|0.000000|0.000000|664
name_00000028   name_000003 -   u   q1:MSTRG.35|MSTRG.35.1|1|0.000000|0.000000|0.000000|3875
name_00000029   name_000003 -   o   q1:MSTRG.36|MSTRG.36.1|2|0.000000|0.000000|0.000000|969
name_00000030   name_000003 -   o   q1:MSTRG.27|MSTRG.27.21|2|0.000000|0.000000|0.000000|5750
name_00000031   name_000004 -   t   q1:MSTRG.27|MSTRG.27.22|3|0.000000|0.000000|0.000000|3425
name_00000032   name_000005 -   t   q1:MSTRG.27|MSTRG.27.24|3|0.000000|0.000000|0.000000|3403
name_00000033   name_000006 -   o   q1:MSTRG.27|MSTRG.27.23|3|0.000000|0.000000|0.000000|921
name_00000034   name_000007 -   u   q1:MSTRG.38|MSTRG.38.1|2|0.000000|0.000000|0.000000|222

在第四列中,有不同的程式碼,比如u, p, k, l, x, o, t所以,從這個特定的列中我想只提取像u, o, t, x, p.

我嘗試為第四列中的一個程式碼提取所有行,如下所示:

cat final.txt | awk '$4=="u"{print $0}' > new.txt

在同一命令中,我如何還提取其他程式碼?

您可以使用正則表達式匹配該欄位:

awk '$4 ~ /^[uotxp]$/' final.txt > new.txt

預設操作列印目前記錄,因此您無需編寫{ print $0 }.

引用自:https://unix.stackexchange.com/questions/619577