Bash

計數匹配和不匹配分組

  • October 23, 2014

請幫助提供以下 shell 腳本。我需要計算樣本(col2)中每個車道(col1)中一致變數的數量。例如,由於所有樣本中的lane1變數1的所有值(col4)都是樣本,因此變數1被計入一致變數。同樣,車道 2 變數 2 和 3 都不一致。

lane1  sample1 variable1 ab
lane1  sample2 variable1 ab
lane1  sample3 variable1 ab   


lane1  sample1 variable2 cd
lane1  sample2 variable2 cd
lane1  sample3 variable2 cd

lane1  sample1 variable3 gh
lane1  sample2 variable3 ab
lane1  sample3 variable3 gh

lane2  sample1 variable1 ac
lane2  sample2 variable1 ac
lane2  sample3 variable1 ac


lane2  sample1 variable2 gt
lane2  sample2 variable2 gt
lane2  sample3 variable2 ac

lane2  sample1 variable3 ga
lane2  sample2 variable3 ga
lane2  sample3 variable3 ac

輸出

所有三個樣本中一致和不一致變數的數量

     #Consistent #Inconsistent
lane1  2             1
lane2  1             2

Perl 解決方案:

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my %values;
while (<>) {
   next if /^$/; # Skip empty lines
   my ($lane, $sample, $var, $val) = split;
   die "Duplicate $lane $sample $var\n" if $values{$lane}{$var}{$val}{$sample};
   $values{$lane}{$var}{$val}{$sample} = 1;
}

my %results;
for my $lane (keys %values) {
   for my $var (keys %{ $values{$lane} }) {
       my $count = keys %{ $values{$lane}{$var} };
       if (1 == $count) {
           ++$results{$lane}{consistent};
       } else {
           ++$results{$lane}{inconsistent};
       }
   }
   say join "\t", $lane, @{ $results{$lane} }{qw{ consistent inconsistent }};
}

引用自:https://unix.stackexchange.com/questions/162593