Re: Bioinformatics: Regex loop, no output

Replies are listed 'Best First'.
Re^2: Bioinformatics: Regex loop, no output by AnomalousMonk (Archbishop) on Nov 16, 2015 at 00:39 UTC
As BrowserUk pointed out here, there is no `!` regex metacharacter/operator (presumably meaning "not-followed-by-the-following-pattern"). What is probably meant is `(?!pattern)` (see Look-Around Assertions in perlre; also see perlretut). This may be what TamaDP was looking for (there are some other good solutions): c:\@Work\Perl\monks>perl -wMstrict -le "my @proteins = qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD AAAKAAA AAAKAAA XXXXXX ); ;; my @new_peptides; for my $protein (@proteins) { if ($protein =~ s{ (?<= [KR]) (?! P) }{=}xmsg) { push @new_peptides, split ('=', $protein); } } ;; for my $peptide (@new_peptides) { print qq{Peptide is '$peptide'}; } " Peptide is 'DAAAAATTLTTTAMTTTTTTCK' Peptide is 'MMFRPPPPPGGGGGGGGGGGG' Peptide is 'ALTAMCMNVWEITYHK' Peptide is 'GSDVNR' Peptide is 'R' Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD' Peptide is 'AAAK' Peptide is 'AAA' Peptide is 'AAAK' Peptide is 'AAA' [download] Un-`s///`-ubstituted input proteins are not `split` and push-ed into the output array. Yet another solution might be: c:\@Work\Perl\monks>perl -wMstrict -le "my @proteins = qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD AAAKAAA AAAKAAA XXXXXX ); ;; my $cleave = qr{ (?<= [KR]) (?! P) }xms; ;; my @peptides = map split($cleave), grep m{ $cleave }xms, @proteins ; ;; print qq{Peptide is '$_'} for @peptides; " Peptide is 'DAAAAATTLTTTAMTTTTTTCK' Peptide is 'MMFRPPPPPGGGGGGGGGGGG' Peptide is 'ALTAMCMNVWEITYHK' Peptide is 'GSDVNR' Peptide is 'R' Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD' Peptide is 'AAAK' Peptide is 'AAA' Peptide is 'AAAK' Peptide is 'AAA' [download] in which the central process might be documented (with the steps of the statement being taken right-to-left, bottom-to-top) as: `my @peptides = # 4. and the pieces are peptides. map split($cleave), # 3. split at each cleavage point... grep m{ $cleave }xms, # 2. that can be cleaved, ... @proteins # 1. for each protein... ;` [download] Update: After sober consideration, I changed the code example above to its present form. The previous code contained `my $cleave = qr{ [KR] (?! P) }xms;` `...` `map split(m{ (?<= $cleave) }xms),` `grep m{ $cleave }xms,` `...` which doesn't really respect the DRY principle, which is what I was aiming to exemplify. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Bioinformatics: Regex loop, no output
by AnomalousMonk (Archbishop) on Nov 16, 2015 at 00:39 UTC

As BrowserUk pointed out here, there is no ! regex metacharacter/operator (presumably meaning "not-followed-by-the-following-pattern"). What is probably meant is (?!pattern) (see Look-Around Assertions in perlre; also see perlretut).

This may be what TamaDP was looking for (there are some other good solutions):

c:\@Work\Perl\monks>perl -wMstrict -le
"my @proteins = qw(
   DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG
   ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD
   AAAKAAA  AAAKAAA  XXXXXX
   );
 ;;
 my @new_peptides;
 for my $protein (@proteins) {
   if ($protein =~ s{ (?<= [KR]) (?! P) }{=}xmsg) {
     push @new_peptides, split ('=', $protein);
     }
   }
 ;;
 for my $peptide (@new_peptides) {
   print qq{Peptide is '$peptide'};
   }
"
Peptide is 'DAAAAATTLTTTAMTTTTTTCK'
Peptide is 'MMFRPPPPPGGGGGGGGGGGG'
Peptide is 'ALTAMCMNVWEITYHK'
Peptide is 'GSDVNR'
Peptide is 'R'
Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD'
Peptide is 'AAAK'
Peptide is 'AAA'
Peptide is 'AAAK'
Peptide is 'AAA'
[download]

s///

split

push

Yet another solution might be:

c:\@Work\Perl\monks>perl -wMstrict -le
"my @proteins = qw(
   DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG
   ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD
   AAAKAAA  AAAKAAA  XXXXXX
   );
 ;;
 my $cleave = qr{ (?<= [KR]) (?! P) }xms;
 ;;
 my @peptides =
   map  split($cleave),
   grep m{ $cleave }xms,
   @proteins
   ;
 ;;
 print qq{Peptide is '$_'} for @peptides;
"
Peptide is 'DAAAAATTLTTTAMTTTTTTCK'
Peptide is 'MMFRPPPPPGGGGGGGGGGGG'
Peptide is 'ALTAMCMNVWEITYHK'
Peptide is 'GSDVNR'
Peptide is 'R'
Peptide is 'ASFAQPPPQPPPPLLAIKPASDASD'
Peptide is 'AAAK'
Peptide is 'AAA'
Peptide is 'AAAK'
Peptide is 'AAA'
[download]

my @peptides =           # 4. and the pieces are peptides.
  map  split($cleave),   # 3. split at each cleavage point...
  grep m{ $cleave }xms,  # 2. that can be cleaved, ...
  @proteins              # 1. for each protein...
  ;
[download]

Update: After sober consideration, I changed the code example above to its present form. The previous code contained
my $cleave = qr{ [KR] (?! P) }xms;
...
map split(m{ (?<= $cleave) }xms),
grep m{ $cleave }xms,
...
which doesn't really respect the DRY principle, which is what I was aiming to exemplify.

Give a man a fish: <%-{-{-{-<

[reply]
[d/l]
[select]