Text-Processing
將一列的值與另一列中的所有值進行比較
我有 2 個輸入文件。的每一行
File1
都應與 的每一行進行比較File2
。邏輯是:
- 如果
Column1
ofFile1
不匹配Column1
(其下的所有值) of ,則在輸出文件中File2
列印整行 of 。File1
同樣,將 的每個值與Column1
下Column1
的每個值進行比較File2
。- 如果
Column1
兩個文件的匹配,並且如果 的值Column2
大於File1
或N+10
小於N-10
,N
則 的值Column2
在哪裡File2
,然後才列印整行 of並像這樣File1
比較所有行。File2
File1
:Contig1 23 Contig1 42 Contig2 68 Contig3 89 Contig3 102 Contig7 79
File2
:Contig1 40 Contig1 49 Contig3 90 Contig2 90 Contig20 200 Contig1 24
預期輸出:
Contig2 68 Contig3 102 Contig7 79
任何解決方案,即使是沒有
awk
or的解決方案,sed
都會這樣做。我發現了一個類似的問題,但我不確定我必須做什麼:
這是程式碼:
`NR==FNR { lines[NR,"col1"] = $1 lines[NR,"col2"] = $2 lines[NR,"line"] = $0 next } (lines[FNR,"col1"] != $1) { print lines[FNR,"line"] next } (lines[FNR,"col2"]+10 < $2 || lines[FNR,"col2"]-10 > $2) { print lines[FNR,"line"] }' file1 file2`
下面的腳本執行以下操作,我認為這就是您想要的:
- 如果 file2 中不存在來自 file1 的 contig,則列印該 contig 的所有行。
- 如果它存在於 file2 中,則對於 file1 中的每個值,僅當它不小於 file2 中的任何 contig 值 -10 或大於 file2 中的任何值 +10 時才列印它。
#!/usr/bin/env perl my (%file1, %file2); ## read file1, the 1st argument open(F1,"$ARGV[0]"); while(<F1>){ chomp; ## Split the line on whitespace into the @F array. my @F=split(/\s+/); ## Save all lines in the %file1 hash. ## $F[0] is the contig name and $F[1] the value. ## The hash will store a list of all values ## associated with this contig. push @{$file1{$F[0]}},$F[1]; } close(F1); ## read file2, the second argument open(F2,"$ARGV[1]"); while(<F2>){ ## remove newlines chomp; ## save the fields into array @F my @F=split(/\s+/); ## Again, save all values associated with each ## contig into the %file2 hash. push @{$file2{$F[0]}},$F[1]; } close(F2); ## For each of the contigs in file1 foreach my $contig (keys(%file1)) { ## If this contig exists in file 2 if(defined $file2{$contig}){ ## get the list of values for that contig ## in each of the two files my @f2_vals=@{$file2{$contig}}; my @f1_vals=@{$file1{$contig}}; ## For each of file1's values for this contig val1:foreach my $val1 (@f1_vals) { ## For each of file2's value for this contig foreach my $val2 (@f2_vals) { ## Skip to the next value from file1 unless ## this one falls within the desired range. unless(($val1 < $val2-10) || ($val1 > $val2+10)){ next val1; } } ## We will only get here if none of the values ## fell within the desired range. If so, we should ## print the value from file1. print "$contig $val1\n"; } } ## If this contig is not in file2, print the ## lines from file1. This will print all lines ## from file1 whose contig was not in file2. else { print "$contig $_\n" for @{$file1{$contig}} } }
將其保存在文本文件中(比如
foo.pl
),使其可執行(chmod a+x foo.pl
)並像這樣執行它:./foo.pl file1 file2
在您的範例中,它返回:
$ foo.pl file1 file2 Contig2 68 Contig3 102 Contig7 79