單個文本文件需要使用 shell 或 bash 腳本進行多次操作

November 16, 2018

我的源文件：Test.txt

注意：文件是製表符分隔的，少數列沒有列名：

Chr  Start  End   Alt   Value
Exo  0      10    .     1.50    .   20:-2     30:0.9    50:50   50
Exo  1      20    .     1.50    .   20:-1     30:-1     50:50   50
Exo  2      30    .     1.50    .   20:0.02   30:0.9    50:50   50
Exo  3      40    .     1.50    .   20:-1     30:-2     50:50   50
Nem  3      40    .     1.50    .   20:-1     30:-2     50:50   50

在上面的文件上試圖實現下面的文件操作，例如：

第 7 列和第 8 列需要用**’:’**進行拆分，並且需要在更改後給出列名，如“mod1”、“mod2”、“mod3”、“mod4”。

2）之後將拆分列移動到“值”列旁邊，並在“mod4”旁邊再放置一個“評論”列（在該評論列中需要空白數據）。

按所有大於 0.01 的值過濾列“Mod2”被刪除

最終結果需要儲存在輸出文件夾中，例如：

Chr  Start  End   Alt  Value  mod1  mod2  mod3  mod4  comment 
Exo  0      10    -1   1.50   20    -2    30    0.9           -1  50:50  50
Exo  1      20    -1   1.50   20    -1    30    -1            -1  50:50  50
Exo  3      40    -1   1.50   20    -1    30    -2            -1  50:50  50

我嘗試了下面的並實現了一些剩餘的操作：

#!bin/bash

cd /home/uxm/Desktop/Shell/

# Replace the only dots (.) by -1

awk -F'\t' '{for(i=1;i&lt;=NF;i++){sub(/^\.$/,"-1",$i)}} 1' OFS="\t" Test.txt | tail &gt;&gt; Test1.txt

# splitted 7th no column by delimitted ":" 

awk '{ split($7, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"a[1]"\t"a[2]"\t"$8"\t"$9"\t"$10"\t"$11 &gt;&gt; "testfile1.tmp"; }' Test1.txt;
mv testfile1.tmp Test2.txt;

# splitted 8th no column by delimitted ":" 

awk '{ split($9, a, ":"); print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"a[1]"\t"a[2]"\t"$10"\t"$11 &gt;&gt; "testfile2.tmp"; }' Test2.txt;
mv testfile2.tmp Test3.txt;

# Give name to splitted columns

awk -F'\t' -v OFS="\t" 'NR==1{$11="nCol\tMod1\tMod2\tMod3\tMod4"}1' Test3.txt &gt;&gt; Test4.txt

# Filter data by "Exo" word 

awk -F'\t' 'NR==1;{ if($1 == "Exo") { print }}' Test4.txt | tail &gt;&gt; Test5.txt

這是一個awk執行您列舉的步驟的腳本。在一個腳本中執行所有操作的好處是不必awk多次執行並將中間結果儲存在文件或變數中。

BEGIN { OFS = FS = "\t" }
NR == 1 {
   # Add new column headers

   # First four "mod" headers
   for (i = 1; i &lt;= 4; ++i)
       $(NF + 1) = "mod" i

   # Then a "comment" header
   $(NF + 1) = "comment"

   # Output and continue with next input line
   print
   next
}

# Ignore lines that don't have "Exo" in the first column
$1 != "Exo" { next }

{
   # Working our way "backwards" from column 13 down to 1

   # Shift the last two columns right by three steps
   $13 = $10
   $12 = $9

   # Set column 11 to column 6, or to -1 if it's a dot
   if ($6 == ".")
       $11 = -1
   else
       $11 = $6 

   # Empty the comment column
   $10 = ""

   # Move column 8 into column 9
   $9 = $8

   # Split column 9 into columns 8 and 9
   split($9, a, ":")
   $9 = a[2]
   $8 = a[1]

   # Split column 7 into columns 6 and 7
   split($7, a, ":")
   $7 = a[2]
   $6 = a[1]

   # Column 5 remains unmodified

   # Put -1 in column 4 if it's a dot
   if ($4 == ".") $4 = -1

   # Columns 1, 2, 3 remains unmodified   
}

# Output if we want this line
$7 &lt;= 0.01 { print }

執行它：

$ awk -f script.awk Test.txt
Chr     Start   End     Alt     Value   mod1    mod2    mod3    mod4    comment
Exo     0       10      -1      1.50    20      -2      30      0.9             -1      50:50   50
Exo     1       20      -1      1.50    20      -1      30      -1              -1      50:50   50
Exo     3       40      -1      1.50    20      -1      30      -2              -1      50:50   50

我從你自己的程式碼中假設你只對這些Exo行感興趣，所以我讓腳本只看這些。而且我假設 thaAlt列（以及原始的第一個無名列）中的任何點都應該更改為-1，同樣通過查看您的程式碼。

引用自：https://unix.stackexchange.com/questions/443659

單個文本文件需要使用 shell 或 bash 腳本進行多次操作

相關問答

如果一列與另一列部分匹配，則匹配

根據來自另一列的指令更改列中的字元

awk - 將每行的幾列與上一行進行比較

與 awk 匹配後僅列印下一行

僅列印與 awk 匹配後的最後一行

使用 shell 根據分隔符將多列拆分為行