Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: WWW::Mechanize doesn't respect <base>?

by Corion (Patriarch)
on Apr 26, 2021 at 09:09 UTC ( [id://11131724]=note: print w/replies, xml ) Need Help??


in reply to WWW::Mechanize doesn't respect <base>?

Somewhat related is this Github issue, but it seems that WWW::Mechanize tries to retrieve the value of base from the HTTP headers instead of (also, and with priority) looking at the HTML base tag.

In vaguely related code, I've used the following to extract the value of the base tag:

# Check if we have a <base> tag which should replace the user-supp +lied URL if( $_[0] =~ s!<\s*\bbase\b[^>]+\bhref=([^>]+)>!!i ) { # Extract the HREF: my $href= $1; if( $href =~ m!^(['"])(.*?)\1! ) { # href="..." , with quotes $href = $2; } elsif( $href =~ m!^([^>"' ]+)! ) { # href=... , without quotes $href = $1; } else { die "Should not get here, weirdo href= tag: [$href]" }; my $old_url = $url; $url = relative_url( $url, $href ); #warn "base: $old_url / $href => $url"; };

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11131724]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-23 22:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found