Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to make a regex that will match: AltaVista found #### results. I'm trying to trap the results but keeping in mind that the result may be a single number of 0 or any number of numbers with an optional comma depending if the number is large enough.

This was my attempt.

=~ /AltaVista found (/d+[,?][/d+?]) results/;

Replies are listed 'Best First'.
Re: Regex to match string with numbers with possible comma
by Happy-the-monk (Canon) on Mar 17, 2004 at 15:38 UTC

    m/AltaVista found (\d+,?\d*) results/;

    should match it. See perldoc perlre for what ? and * do. Also have a look at perldoc perlretut.

    Sören

    Edit:
    There seems to be no node for perlretut here yet. Edited to point to perldoc.com. Anyway, perlre is the better match for this question.

Re: Regex to match string with numbers with possible comma
by Roy Johnson (Monsignor) on Mar 17, 2004 at 15:55 UTC
    Is there any reason to worry about what's between "found" and "results"? Unless it might come back with "found pig results" and you don't want to match on that, just:
    =~ /found (.*) results/;
    See also this node for how to match properly-formatted numbers with commas.

    The PerlMonk tr/// Advocate
      .* is greedy, consider the string found 10 results blah blah found 1,000 results. Generally .* is a subpattern to be suspicious of; I would implement your idea with something like /found (.{1,12}) results/ or /found (.*?) results/.
        We're talking about a fairly predictable response, where I wouldn't expect to see multiple "found" messages. You provide one easy fix with *?. Another is:/found \S+ results/

        The PerlMonk tr/// Advocate
Re: Regex to match string with numbers with possible comma
by pboin (Deacon) on Mar 17, 2004 at 15:41 UTC

    This should do it: =~ /Altavista found ([\d,]+) results/

      I tried your regex but it's not pulling back a result. It comes back as "". Any idea why? The url I'm trying is http://www.altavista.com/web/results?q=url:www.tek-tips.com&kl=XX&search=Search.
      #!/usr/bin/perl use LWP::Simple; use strict; $|=1; my $url = "www.tek-tips.com"; my $altavista = "http://www.altavista.com/web/results?q=url:$url&kl=XX +&search=Search"; my $content = get("$altavista"); my @lines = split /\n/, $content; my $results; foreach (@lines) { $results = $1 if $_ =~ m/Altavista found ([\d,]+) results/; } print "searched: $altavista\n"; print "results: $results";

        The 'v' in AltaVista needs to be capitalized, and you should be good to go. (With the exception that the commas aren't stripped.) There's an example of how to do that in this thread by Anonymous Monk Chris.

        I don't know what you plan on doing, but I typically strip commas right away -- they're nothing but trouble.

Re: Regex to match string with numbers with possible comma
by pboin (Deacon) on Mar 17, 2004 at 15:45 UTC

    Actually, one of your assumptions appears to be wrong...

    Altavista does not return "Altavista found 0 results", I just checked it, and the closest thing to a return message would probably be "We found 0 results." 'Altavista' should not be part of your regex.

    So, you want to be sure to specifically test the zero case, a case of less than 1000, and a case of over 1000.

      And make sure you test values over 1,000,000 as well. Searching for 'porn' results in "AltaVista found 15,151,791 results".

      One might want to use Regexp::Common.

      Abigail

Re: Regex to match string with numbers with possible comma
by matija (Priest) on Mar 17, 2004 at 15:39 UTC
    =~ /AltaVista found (/d+(,/d+)*) results/;
Re: Regex to match string with numbers with possible comma
by Anonymous Monk on Mar 17, 2004 at 15:49 UTC
    You could precede that statement with:

    =~ tr/,//d;

    This would remove any commas in the string before you attempt a match. Then your match could be:

    =~ /AltaVista found /(\d+)/ results/;
    HTH,

    Chris

Re: Regex to match string with numbers with possible comma
by Not_a_Number (Prior) on Mar 17, 2004 at 19:42 UTC

    A word of warning, probably too late, but possibly important.

    None of the above solutions works if AltaVista finds just one result:

    AltaVista found 1 result

    (No 's' on 'result'...)

    dave