nicholaspr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, So if I have a string for example str='ac-t-c-t-g' and I want to extract the substring containing 4 characters c|t|g starting from the beginning without carrying how many '-' characters are in between. SO in this case it will be 'ac-t-c'. In another example str='a----c-tg-ggg' it will be 'a----c-tg'. I was wondering if you could do it using a regular expression rather than for loop?

Replies are listed 'Best First'.
Re: Perl match
by toolic (Bishop) on May 30, 2012 at 17:52 UTC
    use warnings; use strict; my $str = 'ac-t-c-t-g'; my $s2; my $i = 0; for (split //, $str) { $s2 .= $_; $i++ if $_ ne '-'; last if $i == 4; } print "$s2\n";
Re: Perl match
by kcott (Archbishop) on May 30, 2012 at 17:46 UTC

    There's a few ways of doing this. Here's one:

    $ perl -Mstrict -Mwarnings -E ' my $x = q{ac-t-c-t-g}; $x =~ /[acgt]/g for (1..4); say substr($x, 0, pos($x)); ' ac-t-c $ perl -Mstrict -Mwarnings -E ' my $x = q{a----c-tg-ggg}; $x =~ /[acgt]/g for (1..4); say substr($x, 0, pos($x)); ' a----c-tg

    -- Ken

Re: Perl match
by AnomalousMonk (Archbishop) on May 30, 2012 at 18:12 UTC
    >perl -wMstrict -le "my $class = qr{ [actg] }xms; ;; for my $str (qw(ac-t-c-t-g a----c-tg-ggg --a-c-t-g-a-a-a--)) { $str =~ m{ \A ((?: -* $class){4}) }xms; print qq{from '$str' -> ($1)}; } " from 'ac-t-c-t-g' -> (ac-t-c) from 'a----c-tg-ggg' -> (a----c-tg) from '--a-c-t-g-a-a-a--' -> (--a-c-t-g)