I'm trying to fetch a web page, follow a javascript redirect, and verify that a string is found at the landing site.

If you copy/paste the url below into a browser

http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066&serviceNam +e=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt=idealo-de.e +xport.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHOP.enfinity% +2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCall-Start%3F +ArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D4488615%26Affili +ateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26ActionID%3Dp +reis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101
you see it redirects to a web page whose page source should match the regex (eg, the price "249,90"). However, WWW::Mechanize cannot follow this redirect -- although in general, mech is usually pretty good about redirects. I think this is a server side redirect.

I realize I could attempt to parse the html on the obtained page, which has some text to the effect of "if you were not redirected, try here." However, I have many groups of such pages I would like to follow redirects with, each of which would require a parse (and some of which might not have a "if didn't redirect, try this" text to parse), and I'm wondering if there is a more general solution

I tried WWW::Selenium, but was unable to get it to work. I suspect this is because selenium is beta-ish, and maybe my environment is just selenium unfriendly. I'm on linux, suse, firefox 2.

Can anyone get this to work, with Mech, Selenium, or something else?

Thanks in advance!

use strict; use warnings; use WWW::Mechanize; my $url = 'http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066 +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHO +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; my $mech = WWW::Mechanize->new(); my $response = $mech->get( $url ); my $html = $mech->content; print "price: $price\n"; print "url: $url\n"; print "html: $html\n"; print "ok" if $html =~ $price;

UPDATE: changed "server side redirect" to "javascript redirect"

UPDATE 2 getting closer, but still can't do what I want:

use strict; use warnings; use WWW::Mechanize; use Data::Dumper; my $url = 'http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066 +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHO +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; print "price: $price\n"; my $redirect_url = redirect_url($url); my $redirect_url_expected = 'http://www.baur.de/is-bin/INTERSHOP.enfin +ity/WFS/Baur-BaurDe-Site/de_DE/-/EUR/BV_ExternalCall-Start?ArticleNo= +515358&NUMSArt=4443504&NUMSArtPc=4488615&AffiliateID=pangora-bd&Name= +pangora-produktdaten-baur&ActionID=preis-produkt-suche-baur&WKZ=79&IW +L=101'; die "oops" unless $redirect_url eq $redirect_url_expected; my $mech = WWW::Mechanize->new(); $mech->agent('Firefox'); $mech->get( $redirect_url ); my $html = $mech->content; print "html from $redirect_url doesn't match $price\n" unless $html =~ + /$price/ ; print "but paste into browser and view source, and it does\n"; print "final url after firefox redirect (but not www::mech redirect) i +s something like " . 'http://www.baur.de/is-bin/INTERSHOP.enfinity/WF +S/Baur-BaurDe-Site/de_DE/-/EUR/BV_DisplayProductInformation-ArticleNo +;sid=7oVhaTsE5oZsaX6rnON4q25Uv6S6Ixu_PzIwW50ajEGxS04TwoV1a_bGFYiItw== +?ArticleNo=515358&ls=0&firstPage=true&showGewinnspiel=true&showW3B=fa +lse' . "\n"; # uncomment this to print html, which is totally different from what y +ou get from firefox, show source. # print "html: $html"; # works ok sub redirect_url { my $url = shift or die "no url"; my $mech = WWW::Mechanize->new(); $mech->get( $url ); my $links; $links = $mech->links; $mech->get( $links->[1]->url ); $links = $mech->links; my $redirect_url = $links->[0]->base->as_string; }

In reply to WWW::Mechanize or WWW::Selenium with javascript redirect by tphyahoo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.