Linux
如何使用 awk 從列中提取特定程式碼?
我有一個名為的文本文件
final.txt
,如下所示:name_00000001 name_000001 - u q1:MSTRG.4|MSTRG.4.1|3|0.000000|0.000000|0.000000|3211 name_00000002 name_000001 - u q1:MSTRG.4|MSTRG.4.2|2|0.000000|0.000000|0.000000|894 name_00000003 name_000001 - p q1:MSTRG.4|MSTRG.4.3|2|0.000000|0.000000|0.000000|522 name_00000004 name_000002 - p q1:MSTRG.26|MSTRG.26.1|1|0.000000|0.000000|0.000000|336 name_00000005 name_000003 - u q1:MSTRG.27|MSTRG.27.1|5|0.000000|0.000000|0.000000|730 name_00000006 name_000003 - k q1:MSTRG.27|MSTRG.27.2|7|0.000000|0.000000|0.000000|3157 name_00000007 name_000003 - k q1:MSTRG.27|MSTRG.27.3|6|0.000000|0.000000|0.000000|3665 name_00000008 name_000003 - u q1:MSTRG.27|MSTRG.27.4|4|0.000000|0.000000|0.000000|7900 name_00000009 name_000003 - u q1:MSTRG.27|MSTRG.27.5|4|0.000000|0.000000|0.000000|4356 name_00000010 name_000003 - k q1:MSTRG.27|MSTRG.27.6|4|0.000000|0.000000|0.000000|1842 name_00000011 name_000003 - u q1:MSTRG.27|MSTRG.27.7|3|0.000000|0.000000|0.000000|2752 name_00000012 name_000003 - p q1:MSTRG.27|MSTRG.27.8|2|0.000000|0.000000|0.000000|300 name_00000013 name_000003 - u q1:MSTRG.27|MSTRG.27.9|2|0.000000|0.000000|0.000000|2895 name_00000014 name_000003 - k q1:MSTRG.27|MSTRG.27.10|2|0.000000|0.000000|0.000000|696 name_00000015 name_000003 - u q1:MSTRG.27|MSTRG.27.11|4|0.000000|0.000000|0.000000|9046 name_00000016 name_000003 - u q1:MSTRG.27|MSTRG.27.12|5|0.000000|0.000000|0.000000|9962 name_00000017 name_000003 - u q1:MSTRG.27|MSTRG.27.13|3|0.000000|0.000000|0.000000|17753 name_00000018 name_000003 - l q1:MSTRG.27|MSTRG.27.14|2|0.000000|0.000000|0.000000|6895 name_00000019 name_000003 - l q1:MSTRG.27|MSTRG.27.15|4|0.000000|0.000000|0.000000|1889 name_00000020 name_000003 - l q1:MSTRG.27|MSTRG.27.16|4|0.000000|0.000000|0.000000|4712 name_00000021 name_000003 - u q1:MSTRG.27|MSTRG.27.17|3|0.000000|0.000000|0.000000|1154 name_00000022 name_000003 - u q1:MSTRG.27|MSTRG.27.18|2|0.000000|0.000000|0.000000|511 name_00000023 name_000003 - x q1:MSTRG.27|MSTRG.27.19|3|0.000000|0.000000|0.000000|2984 name_00000024 name_000003 - u q1:MSTRG.27|MSTRG.27.20|2|0.000000|0.000000|0.000000|4944 name_00000025 name_000003 - x q1:MSTRG.32|MSTRG.32.1|1|0.000000|0.000000|0.000000|279 name_00000026 name_000003 - x q1:MSTRG.33|MSTRG.33.1|2|0.000000|0.000000|0.000000|543 name_00000027 name_000003 - u q1:MSTRG.34|MSTRG.34.1|2|0.000000|0.000000|0.000000|664 name_00000028 name_000003 - u q1:MSTRG.35|MSTRG.35.1|1|0.000000|0.000000|0.000000|3875 name_00000029 name_000003 - o q1:MSTRG.36|MSTRG.36.1|2|0.000000|0.000000|0.000000|969 name_00000030 name_000003 - o q1:MSTRG.27|MSTRG.27.21|2|0.000000|0.000000|0.000000|5750 name_00000031 name_000004 - t q1:MSTRG.27|MSTRG.27.22|3|0.000000|0.000000|0.000000|3425 name_00000032 name_000005 - t q1:MSTRG.27|MSTRG.27.24|3|0.000000|0.000000|0.000000|3403 name_00000033 name_000006 - o q1:MSTRG.27|MSTRG.27.23|3|0.000000|0.000000|0.000000|921 name_00000034 name_000007 - u q1:MSTRG.38|MSTRG.38.1|2|0.000000|0.000000|0.000000|222
在第四列中,有不同的程式碼,比如
u, p, k, l, x, o, t
所以,從這個特定的列中我想只提取像u, o, t, x, p
.我嘗試為第四列中的一個程式碼提取所有行,如下所示:
cat final.txt | awk '$4=="u"{print $0}' > new.txt
在同一命令中,我如何還提取其他程式碼?
您可以使用正則表達式匹配該欄位:
awk '$4 ~ /^[uotxp]$/' final.txt > new.txt
預設操作列印目前記錄,因此您無需編寫
{ print $0 }
.