Bash
計數匹配和不匹配分組
請幫助提供以下 shell 腳本。我需要計算樣本(col2)中每個車道(col1)中一致變數的數量。例如,由於所有樣本中的lane1變數1的所有值(col4)都是樣本,因此變數1被計入一致變數。同樣,車道 2 變數 2 和 3 都不一致。
lane1 sample1 variable1 ab lane1 sample2 variable1 ab lane1 sample3 variable1 ab lane1 sample1 variable2 cd lane1 sample2 variable2 cd lane1 sample3 variable2 cd lane1 sample1 variable3 gh lane1 sample2 variable3 ab lane1 sample3 variable3 gh lane2 sample1 variable1 ac lane2 sample2 variable1 ac lane2 sample3 variable1 ac lane2 sample1 variable2 gt lane2 sample2 variable2 gt lane2 sample3 variable2 ac lane2 sample1 variable3 ga lane2 sample2 variable3 ga lane2 sample3 variable3 ac
輸出
所有三個樣本中一致和不一致變數的數量
#Consistent #Inconsistent lane1 2 1 lane2 1 2
Perl 解決方案:
#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my %values; while (<>) { next if /^$/; # Skip empty lines my ($lane, $sample, $var, $val) = split; die "Duplicate $lane $sample $var\n" if $values{$lane}{$var}{$val}{$sample}; $values{$lane}{$var}{$val}{$sample} = 1; } my %results; for my $lane (keys %values) { for my $var (keys %{ $values{$lane} }) { my $count = keys %{ $values{$lane}{$var} }; if (1 == $count) { ++$results{$lane}{consistent}; } else { ++$results{$lane}{inconsistent}; } } say join "\t", $lane, @{ $results{$lane} }{qw{ consistent inconsistent }}; }