用於拆分文件的 Perl 程式碼（如果存在 16s 和 23s）。並複製到一個文件中

August 28, 2017

我有一個文件，我想從中搜尋字元串“16S”和“23S”，並將包含這些字元串的部分提取到兩個單獨的文件中。

輸入文件：

start
description Human 16S rRNA
**some text**
**some text**
//
start
description Mouse 18S rRNA
some text
some text
//
start
description Mouse 23S rRNA
some text
some text
//

預期輸出：16S 的 File1：

start
description Human 16S rRNA
some text
some text
//

23S的文件2：

start
description Mouse 23S rRNA
some text
some text
//

我使用的程式碼：

#! /usr/bin/perl   
# default output file is /dev/null - i.e. dump any input before
# the first [ entryN ] line.

$outfile='FullrRNA.gb';
open(OUTFILE,"&gt;",$outfile) || die "couldn't open $outfile: $!";

while(&lt;&gt;) {
 # uncomment next two lines to optionally remove comments (startin with
 # '#') and skip blank lines.  Also removes leading and trailing
 # whitespace from each line.
 # s/#.*|^\s*|\s*$//g;
 # next if (/^$/)

 # if line begins with 'start', extract the filename
 if (m/^\start/) {
   (undef,$outfile,undef) = split ;
   close(OUTFILE);
   open(OUTFILE,"&gt;","$outfile.txt") || die "couldn't open $outfile.txt: $!";
 } else {
   print OUTFILE;
 }
}
close(OUTFILE);

如果您可以依賴<LF>//<LF>作為記錄分隔符，那麼使用 GNU awk，這可能只是：
gawk -v 'RS=\n//\n' '
 {ORS=RT}; / 16S /{print &gt; "file1"}; / 23S /{print &gt; "file2"}' &lt; file

awk對不起，我會用 Perl 而不是 Perl 來解決這個問題。
/^\/\// && file { file = file ".out";
                 print section ORS $0 &gt;file;
                 file = "" }

/^description/ && match($0, p) && file = substr($0,RSTART,RLENGTH) {}

/^start/        { section = $0; next       }
               { section = section ORS $0 }
在你的數據上執行它（你p='expression'用來挑選你想要的部分）：
$ awk -f script.awk p='16S|23S' file.in
$ ls -l
total 16
-rw-r--r--  1 kk  wheel   64 Aug 28 12:10 16S.out
-rw-r--r--  1 kk  wheel   56 Aug 28 12:10 23S.out
-rw-r--r--  1 kk  wheel  176 Aug 28 11:51 file.in
-rw-r--r--  1 kk  wheel  276 Aug 28 12:09 script.awk
$ cat 16S.out
start
description Human 16S rRNA
**some text**
**some text**
//
$ cat 23S.out
start
description Mouse 23S rRNA
some text
some text
//
如果我們找到一個結束標記（以開頭的行//）並且輸出文件名 ( file) 不為空，則腳本中的第一個塊將執行。它附加.out到目前文件名，並將保存的部分和目前輸入行輸出到文件中。然後它清空file變數。
第二個塊是空的，但該模式將匹配以開頭的行description，並將繼續將該行與命令行 ( p) 上給出的正則表達式進行匹配。如果匹配，將挑選出匹配的部分並用作文件名。
如果我們找到以單詞開頭的行start並且它只是將保存的部分文本設置為目前行，則執行第三個塊，丟棄其中保存的任何舊文本。然後它跳到腳本的開頭並考慮下一個輸入行。
對文件中的所有其他行執行最後一個塊，並將目前行附加到目前保存的部分。

引用自：https://unix.stackexchange.com/questions/388792

用於拆分文件的 Perl 程式碼（如果存在 16s 和 23s）。並複製到一個文件中

相關問答

我想替換文本文件中的特定單詞並使用 perl 或 ubuntu 命令將結果保存在多個文本文件中？

根據行號拆分大文件，執行它，然後將最終輸出附加到一個文件中

cat 並拆分文件並上傳到 hdfs？

從帶有重音字元的逗號分隔文本中刪除欄位

按生日從今天開始的順序列出生日

“gzip：stdin：文件意外結束”在使用 tar 和 split 後