in reply to Another regexp question

Another single regular expression, with the @array = $str =~ /regexp/ idiom -
my $str = 'The rabbits is $10 and the dogs are $20. The phone number i +s 555-1212.'; # updated: thanks to davido to point out the interpolation # of $10 and $20 in my double quoted string. I have changed # the double quote to single quote. my @capture = $str =~ /(rabbits|dogs|\d+-\d+)/g; print "$_\n" for @capture;
To be more elaborate, I have constructed the following example to demonstrate how to capture into a hash and an array.

use strict; use Data::Dumper; my $str = 'The rabbits is $10 and the dogs are $20. ' . 'The phone number is 555-1212, mobile number 0404-120021'; my $animal = "rabbit|dog"; my %prices = $str =~ m/((?:$animal)s?)\s(?:is|are)\s(\$\d+)/g; my @phone = $str =~ m/(\d+-\d+)/g; print Dumper(\%prices); print Dumper(\@phone);
And the output is -
$VAR1 = { 'dogs' => '$20', 'rabbits' => '$10' }; $VAR1 = [ '555-1212', '0404-120021' ];
In general the complexity of the regular expression increases if the number of requirement increases, as well as the complexity of your sentense structure. You will have to pick one best suited to your data.

And of course if you want to parse natural language, automatically recognise what is an animal, and pick out the price from a complex sentense, it will be a mammoth task indeed.

Try pick out the prices from the following sentense :-)

I have a dog and two cats, I will charge you $10 for each cat, but I won't sell the dog to you for $10, I will have to charge you $20 extra.

Replies are listed 'Best First'.
Re: Re: Another regexp question
by wolis (Scribe) on Nov 20, 2003 at 04:06 UTC
    Hi There,

    In asking my question for clarification I belive I have answered myself but anyway:

    Can I have some clarification on the wonderful line:

    my %prices = $str =~ m/((?:$animal)s?)\s(?:is|are)\s(\$\d+)/g;
    As I understand it the brackets are use to return the values as $1, $2 .. $9

    It looks like (?: .. ) is something special and is not returning a numbered variable so all we get out are two variables $1 and $2 used respectively in the hash as the key and value?

    I have often wanted to check to see if a string contains one of multiple sub-strings.. I assume I could do it like this:

    use strict; my $something = 'This is a bang of a bing thing'; if($something =~ m/((?:bing|bong|bang))/i) { print "Found '$1' in '$something'\n"; }
    Woo Hoo!
    Found 'bang' in 'This is a bang of a bing thing'
    It returned the first one found.. which is fair.. I wonder if this could return all matches found, in this case 'band' and 'bing'?

    And could I check for a string contain anything from a list?

    ... my @list = qw / bing bong bang /; if($something =~ m/(?:@list)/i) ...
    naturally does not work :-(

    thanks

    ___ /\__\ "What is the world coming to?" \/__/ www.wolispace.com
        It looks like (?: .. ) is something special and is not returning a numbered variable so all we get out are two variables $1 and $2 used respectively in the hash as the key and value?

      You bet. ;-) The ?: in the bracket tells Perl not to capture the pattern inside the bracket. You can find the documentation on (?:pattern) on the CPAN perlre documentation here

        And could I check for a string contain anything from a list?

      Well, yes you can. The method I use is to construct the search pattern with a join, as the following example demonstrates -
      my $something = 'This is a bang of a bing thing'; my @list = qw /bing bong bang/; # want to search for these my $list = join '|', @list; # construct my pattern if($something =~ m/($list)/i) { print "Found '$1' in '$something'\n"; }
      If you want to capture all occurances of the patterns, you could use the @array = $str =~ m/pattern/g idiom.
      my @search = $something =~ m/($list)/ig; # <- added the g modifier
      or you could do this in a while loop -
      while ($something =~ m/($list)/ig) { print "Found '$1' in '$something'\n"; }
      The problem with your code is that m/(@list)/i is looking for the pattern of the interpolated list items, the pattern "bing bong bang", in the string, and of cause it is not found.

      use strict; my $something = 'This is a bang of a bing thing bing bong bang'; my @list = qw / bing bong bang /; if ($something =~ m/(@list)/i) { print "Found '$1' in '$something'\n"; }
      And the output is -
      Found 'bing bong bang' in 'This is a bang of a bing thing bing bong ba +ng'
        Ah.. thankyou.

        So what I am reading between the lines here is the OR ability in regex.. which I iether never knew, or knew and forgot.

        So the next question (and you dont have to answer it cos just pondering outloud) is: where does the OR finish and the next thing begin?

        $string = 'this is a bing'; # sample strings.. $string = 'bing is my name'; $string = 'cows go bonging'; $string = 'cows go bang99'; $string =~ m/^bing|bong|bang\d\d/;
        Would I need to put ^ infront of each OR case of I want them to match at the beginning of the line only?
        Similarly if I want all to only match if the end with \d\d do I include it at the end or in each case?
        How does it know the end of the start of the first OR case and the end of the last OR case?
        ___ /\__\ "What is the world coming to?" \/__/ www.wolispace.com