in reply to regex problem

What stevieb said. Handing someone a bunch of code and saying "This doesn't work. Please figure out how it should work and fix it" is not likely to be productive unless you also hand over a bunch of money.

That said, one note: In the regex
    my @urls=$response_body =~ m{(http://b.thumbs.redditmedia.com/ ... }gi;
the sub-pattern  b.thumbs.redditmedia.com has embedded  . (dot) metacharacters that match anything (except a newline, unless the  /s switch is asserted, which it isn't). Here's the effect:

c:\@Work\Perl\monks>perl -wMstrict -le "for my $str (qw(aXbXc a.b.c)) { printf qq{for '$str' }; print $str =~ m{ a.b.c }xms ? 'match' : 'NO match'; } " for 'aXbXc' match for 'a.b.c' match
Now try meta-quoting in some way, e.g.:
c:\@Work\Perl\monks>perl -wMstrict -le "for my $str (qw(aXbXc a.b.c)) { printf qq{for '$str' }; print $str =~ m{ \Qa.b.c\E }xms ? 'match' : 'NO match'; } " for 'aXbXc' NO match for 'a.b.c' match


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: regex problem
by grasshopper!!! (Beadle) on Nov 04, 2015 at 21:28 UTC

    I was just wondering how to parse out the jpgs links from reddit wallpaper page following a commandline fu bash script,just for fun. The main problem is how to reject a match which matches beginning and end but by using .+? matches falsely in the middle.I know you can reject matching letters with [^ahi] type expression.But I dont know how to reject a string in the middle of a large amount of data. The following shows the problem match.

    http://b.thumbs.redditmedia.com/HUX1reWBCHSIQunAgKXYkb8nXEXY6cw0cTizkTcEw4U.png
    random html etc alot
    http://b.thumbs.redditmedia.com/bqYiA dIiTp01k7ca6UIjpWSJqOjHGeTv7JPwko4WrEQ.jpg

    Rejecting the png but matching random file names in a mess of data is what Im struggling to do. Thanks for any help.This is just for fun so no sweat.

      Have solved problem but I dont understand why one lookahead works and another does not.

      use strict; use warnings; use WWW::Curl::Easy; my $curl = WWW::Curl::Easy->new; $curl->setopt(CURLOPT_HEADER,1); $curl->setopt(CURLOPT_URL, 'http://www.reddit.com/r/wallpapers.rss'); my $response_body; $curl->setopt(CURLOPT_WRITEDATA,\$response_body); # Starts the actual request my $retcode = $curl->perform; # Looking at the results... if ($retcode == 0){ print("Transfer went ok\n\n"); my $response_code = $curl->getinfo(CURLINFO_HTTP_CODE); my @urls=$response_body =~ m{(http://b.thumbs.redditmedia\.com/(?:( +?!png).)*?\.jpg)}gi; $" ="\n\n"; print "@urls\n"; } else { # Error code, type of error, error message print("An error happened: $retcode ".$curl->strerror($retcode)." ".$ +curl->errbuf."\n"); }

      Thank you all that helped.

        The regex
            (?!pattern).
        simply says that whatever  . matches is not the start of whatever  pattern matches (which can be any regex whatsoever). Wrapping this in a non-capturing group
            (?:(?!pattern).)
        allows you to quantify the grouped expression in the usual way, so
            (?:(?!pattern).)*?
        means "zero or more of anything as long as it doesn't begin a  pattern sequence, with lazy instead of greedy matching". (And I'll go to my grave using "lazy" instead of "non-greedy" as the antonym of "greedy" matching.)

        Here's what YAPE::Regex::Explain sez, but I don't think it captures the essence of the concept as clearly:

        c:\@Work\Perl\monks>perl -wMstrict -le "use YAPE::Regex::Explain; ;; print YAPE::Regex::Explain->new('(?:(?!pattern).)*?')->explain; " The regular expression: (?-imsx:(?:(?!pattern).)*?) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the least amount possible)): ---------------------------------------------------------------------- (?! look ahead to see if there is not: ---------------------------------------------------------------------- pattern 'pattern' ---------------------------------------------------------------------- ) end of look-ahead ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- )*? end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------


        Give a man a fish:  <%-{-{-{-<