Linux
單個文本文件需要使用 shell 或 bash 腳本進行多次操作
我的源文件:Test.txt
注意:文件是製表符分隔的,少數列沒有列名:
Chr Start End Alt Value Exo 0 10 . 1.50 . 20:-2 30:0.9 50:50 50 Exo 1 20 . 1.50 . 20:-1 30:-1 50:50 50 Exo 2 30 . 1.50 . 20:0.02 30:0.9 50:50 50 Exo 3 40 . 1.50 . 20:-1 30:-2 50:50 50 Nem 3 40 . 1.50 . 20:-1 30:-2 50:50 50
在上面的文件上試圖實現下面的文件操作,例如:
- 第 7 列和第 8 列需要用**’:’**進行拆分,並且需要在更改後給出列名,如“mod1”、“mod2”、“mod3”、“mod4”。
2)之後將拆分列移動到“值”列旁邊,並在“mod4”旁邊再放置一個“評論”列(在該評論列中需要空白數據)。
- 按所有大於 0.01 的值過濾列“Mod2”被刪除
最終結果需要儲存在輸出文件夾中,例如:
Chr Start End Alt Value mod1 mod2 mod3 mod4 comment Exo 0 10 -1 1.50 20 -2 30 0.9 -1 50:50 50 Exo 1 20 -1 1.50 20 -1 30 -1 -1 50:50 50 Exo 3 40 -1 1.50 20 -1 30 -2 -1 50:50 50
我嘗試了下面的並實現了一些剩餘的操作:
#!bin/bash cd /home/uxm/Desktop/Shell/ # Replace the only dots (.) by -1 awk -F'\t' '{for(i=1;i<=NF;i++){sub(/^\.$/,"-1",$i)}} 1' OFS="\t" Test.txt | tail >> Test1.txt # splitted 7th no column by delimitted ":" awk '{ split($7, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"a[1]"\t"a[2]"\t"$8"\t"$9"\t"$10"\t"$11 >> "testfile1.tmp"; }' Test1.txt; mv testfile1.tmp Test2.txt; # splitted 8th no column by delimitted ":" awk '{ split($9, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"a[1]"\t"a[2]"\t"$10"\t"$11 >> "testfile2.tmp"; }' Test2.txt; mv testfile2.tmp Test3.txt; # Give name to splitted columns awk -F'\t' -v OFS="\t" 'NR==1{$11="nCol\tMod1\tMod2\tMod3\tMod4"}1' Test3.txt >> Test4.txt # Filter data by "Exo" word awk -F'\t' 'NR==1;{ if($1 == "Exo") { print }}' Test4.txt | tail >> Test5.txt
這是一個
awk
執行您列舉的步驟的腳本。在一個腳本中執行所有操作的好處是不必awk
多次執行並將中間結果儲存在文件或變數中。BEGIN { OFS = FS = "\t" } NR == 1 { # Add new column headers # First four "mod" headers for (i = 1; i <= 4; ++i) $(NF + 1) = "mod" i # Then a "comment" header $(NF + 1) = "comment" # Output and continue with next input line print next } # Ignore lines that don't have "Exo" in the first column $1 != "Exo" { next } { # Working our way "backwards" from column 13 down to 1 # Shift the last two columns right by three steps $13 = $10 $12 = $9 # Set column 11 to column 6, or to -1 if it's a dot if ($6 == ".") $11 = -1 else $11 = $6 # Empty the comment column $10 = "" # Move column 8 into column 9 $9 = $8 # Split column 9 into columns 8 and 9 split($9, a, ":") $9 = a[2] $8 = a[1] # Split column 7 into columns 6 and 7 split($7, a, ":") $7 = a[2] $6 = a[1] # Column 5 remains unmodified # Put -1 in column 4 if it's a dot if ($4 == ".") $4 = -1 # Columns 1, 2, 3 remains unmodified } # Output if we want this line $7 <= 0.01 { print }
執行它:
$ awk -f script.awk Test.txt Chr Start End Alt Value mod1 mod2 mod3 mod4 comment Exo 0 10 -1 1.50 20 -2 30 0.9 -1 50:50 50 Exo 1 20 -1 1.50 20 -1 30 -1 -1 50:50 50 Exo 3 40 -1 1.50 20 -1 30 -2 -1 50:50 50
我從你自己的程式碼中假設你只對這些
Exo
行感興趣,所以我讓腳本只看這些。而且我假設 thaAlt
列(以及原始的第一個無名列)中的任何點都應該更改為-1
,同樣通過查看您的程式碼。