Galen has asked for the wisdom of the Perl Monks concerning the following question:

Dear Regularexpressiongurus,

How does one extract the middle of a string with a regular expression, given the characters before and after the substring desired? For example, let's say I need to isolate all characters between "abc" and "xyz"...

$foo = "abchelloxyz"; $bar = /regex on $foo/; # $bar = "hello"
I can do this with index and substr, but regex has got to be more efficient. Thanks in advance :)

Replies are listed 'Best First'.
(wil) Re: regex: extract a substring
by wil (Priest) on May 23, 2002 at 20:38 UTC
    If you're sure that that is going to be your string every time, then something like this should work:
    $foo = "abchelloxyz"; $foo =~ s/abc([\w]+)xyz/$1/;

    However, if your input is going to be different, i.e. use something other than (in my example) any alphanumeric character, then your regex is going to be slightly different too.

    Posting some sample data would help in building up a regex for you.

    - wil
      Do keep in mind that wil's example will make foo equal to the characters between abc and xyz. i.e. $foo == "hello" if you are trying to extract to another variable you need to do something more like this.
      $foo = "abchelloxyz"; $foo =~ /abc([\w]+)xyz/; $bar = $1;

      the value of $foo is unchanged. ---Mogaka is good
        $foo =~ /abc([\w]+)xyz/; $bar = $1;

        or in one line:

        ($bar) = /abc([\w]+)xyz/;

        From perlop:

        If the /g option is not used, m// in a list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3...). (Note that here $1 etc. are also set, and that this differs from Perl 4's behavior.) When there are no parentheses in the pattern, the return value is the list (1) for success. With or without parentheses, an empty list is returned upon failure.

        -- Hofmator

        en passant:
        $foo='abchelloxyz'; ($bar=$foo)=~s/abc(\w+)xyz/\1/;
Re: regex: extract a substring
by derby (Abbot) on May 23, 2002 at 20:39 UTC
    More effecient? Probably not ... but definetly easier to read and more idiomatic. Look for "capturing" in perlre.

    #!/usr/bin/perl -wd use strict; use Benchmark; timethese( 50000, { 'regex' => sub { &regex(); }, 'ind_sub' => sub { &ind_sub(); }, } ); sub regex { $_ = "abchelloxyz"; m/abc(.*?)xyz/; my $bar = $1; } sub ind_sub { $_ = "abchelloxyz"; my $x = index( $_, "abc") + 3; my $y = index( $_, "xyz"); my $bar = substr( $_, $x, $y - $x); }

    -derby

Re: regex: extract a substring
by Galen (Beadle) on May 23, 2002 at 22:27 UTC
    Thanks for the help! I did need to maintain the value of $foo - my final code was
    $value =~ /$header:_:(.*?):_:/; $data = $1; push(@rows, $data);
    Works great :)
      Then why not just push(@rows, $value =~ /$header:_:(.*?):_:/);