Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need a regex to match
document.write('<'+'img '+'src='+'"http://im'+'ages.site.com/ima'+'ges +/full/'+'16/'+'167/1676892130'+'.jpg"'+'><'+'BR>');
Here's the thing though. This site is a little tricky. It uses the '+' in random places on each reload and the #s are to match any numbers, not just the ones in this example. In the end, I want the URL without all that extra '+' junk of the image.

This is a little beyond my regex abilities, can anyone help?

Replies are listed 'Best First'.
Re: regex to match one nasty javascript line
by ikegami (Patriarch) on Nov 27, 2006 at 18:29 UTC

    Get rid of '+' first, not last.

    $js =~ s/'\+'//g; my ($url) = $js =~ m{(http://images\.site\.com/images/full/\d+/\d+/\d+ +\.jpg)};
Re: regex to match one nasty javascript line
by SheridanCat (Pilgrim) on Nov 27, 2006 at 18:30 UTC
    Are you just trying to extract the link?
    while( <DATA> ){ s|\'\+\'||g; m/.*(http.*)\"/; print $1, "\n"; } __DATA__ document.write('<'+'img '+'src='+'"http://im'+'ages.site.com/ima'+'ges +/full/'+'16/'+'167/1676892130'+'.jpg"'+'><'+'BR>');
      Just a minor point. Single- and double-quotes are not regular expression metacharacters so they don't need to be escaped.

      Cheers,

      JohnGG

Re: regex to match one nasty javascript line
by muba (Priest) on Nov 27, 2006 at 18:28 UTC
    well... I'm not sure if this one will always work, it's just a quick and dirty hack - as soon as there is another ) within the url or something, it's bound to break. It will also not work if there is a document.write line above this one.

    Let's divide the whole thing into a couple of steps first.
    1. Get the document.write line
    2. Get rid of the '+' things
    3. Only get the actual url, not the rest of that stuff
    Well, here goes..

    my ($url) = $website =~ m/document\.write\((.+?)\)/; $url =~ s/'\+'//g; ($url) = $url =~ m/src="(.+?)">/;
Re: regex to match one nasty javascript line
by Cody Pendant (Prior) on Nov 28, 2006 at 03:41 UTC
    Would you like to let us know which site you're downloading these images from? You're going to so much trouble, they must be very good images...


    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
      This is in perl? Have you considered replacing the "+"s with "."s, and then eval-ing the string? Then you don't need to worry about parenthesis.
Re: regex to match one nasty javascript line
by Anonymous Monk on Nov 28, 2006 at 03:12 UTC
    s/^.*src=//g;<br> s/'\+'//g;<br> s/\.jpg.*//g;<br> s/^"//g;<br> print<br> #publicdomain<br> # HINT: use a filehandle to get the string into $_, it's devilishly tr +icky without it...<br>
      shercat or w/e (sry, bad w/ names) did it better :-)
Re: regex to match one nasty javascript line
by Anonymous Monk on Nov 28, 2006 at 03:12 UTC
    s/^.*src=//g;
    s/'\+'//g;
    s/\.jpg.*//g;
    s/^"//g;
    print
    #publicdomain
    # HINT: use a filehandle to get the string into $_, it's devilishly tricky without it...
      You don't need to use a filehandle to alias $_ to something. for ($var) { ... } works nicely.