Hi All, I have a question in pattern match. I have different set of samples like CGPP5048286, WGA_PD4005a,1710STDY5035576, PD4005a, PD4005b, PD4005c. I have a script that works and finding the identity between the samples. I want to pattern match the sample names to find the same sample or its identical one like PD4005b, PD4005c are matched.
for (my $j = 0; $j<=scalar(@sam2com);$j++){ my $s1 = $sam2com[ $j ] ; my $geno1 = $source_set->{$s1}; my $top_percent = 0; my $top = ''; for (my $k = 0;$k<=scalar(@sam2com);$k++){ my $match = 0; my $s2 = $sam2com[ $k ]; my $geno2 = $source_set->{$s2}; my $set = Array::Each->new(\@$geno1, \@$geno2); while (my($g1, $g2, $index2) = $set->each() ){ #print "$s1|$s2|$g1|$g2|$index2\n"; next if $g1 eq "" || $g2 eq ""; next if $g1 =~ /^NN/i || $g2 =~ /^NN/i; if($g1 eq $g2){ $match++; #print "$s1|$s2|$g1|$g2|$match|$index2\n"; }#end of if $g1 eq $g2 }#end of while loop $set->each my $percentage = sprintf "%.2f", ($match * 100)/( scalar @$gen +o1 ) ; print SUM $percentage, ","; next if ($percentage < 75); #print "$s1|$s2|$percentage\n"; if( ( $percentage >=$top_percent) and ($top ne $s1 ) ){ $top_percent= $percentage; $top = $s2; } #end of if $percentage >=$top_percent) and $top ne $s1 push @{ $com_sam->{ $s1 }->{ $percentage } }, { sample =>$s2, percent =>$percentage, match =>$match }; my $ge1 = join "", @$geno1; my $ge2 = join "", @$geno2; if( ( $ge1 eq $ge2 ) and ( $s1 ne $s2 ) ) { print LOG "$s1|$s2|$ge1 | $ge2\n"; } }#end of for $k sam2com print SUM "\n"; #sort by percentage in desending order. Get the samples match to other + sample percentage of match and top hit . foreach my $percent ( sort { $b <=> $a } keys %{ $com_sam->{ $s1 } + } ){ my $match_samples = $com_sam->{ $s1 }->{ $percent }; foreach my $matSam( @ { $match_samples } ){ if( ( $s1 ne $matSam->{ sample } ) and ($matSam->{ percent + } >= $top_percent) ) {#check the sample1 matches with a different sa +mple with a higher percntage. print LOG "Sample $s1 matches with $matSam->{ sample + } with $matSam->{ percent }\n"; } #else{ my $l = sprintf "%s, %s, %0.2f, %s, %0.2f ", $s1, $mat +Sam->{ sample }, $matSam->{ percent }, $top, $top_percent; print OUT $l,"\n"; #} }#end of forach $match->sample }#end of percentage foreach loop }#end of for $j @sam2com.
I dont want to include the samples PD4005a, PD4005b, PD4005c or WGA_PD4005a|b|c in the LOG file. Since they are identical samples. Any way of doing this? I tried
my ($n, $m, $o) = $s1 =~ /^(PD|WGA_PD)(\d+)(a|b|c)/; 155 my ($n1, $m1, $o1) = $matSam->{ sample } =~ /^(PD|WGA_PD)(\d ++)(a|b|c)/; and comparing the $m ==$m1 and also /(\w+)(a|b|c). and comparing the $1 of one sample with the other. But +surely, I am making some stupid mistakes. as they still do come in th +e LOG file.
Any suggestion please. Thanks

In reply to pattern match with different sets. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.