Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re-write all internal links on a web page.

by ehdonhon (Curate)
on Jun 26, 2003 at 18:53 UTC ( [id://269365]=CUFP: print w/replies, xml ) Need Help??

The problem is to maintain persistent session information without using cookies. The solution is to encode the session as a get parameters on all links that link to another internal page.

The second problem is that most of the web pages are maintained by somebody that just barely understands html. We don't want to need to teach them about sessions and stuff.

The solution is to take the html content and re-write all of the links prior to displaying the page. I'm using HTML::TreeBuilder to solve this. There are probably other ways.

my $owned_sites = qr/mysite\.(com|net|org)/i; sub add_sessions { my $root = HTML::TreeBuilder->new_from_content( shift() ); my $session = shift; foreach my $link ($root->look_down( '_tag', 'a' ) ) { next unless my $url = $link->attr('href'); if ( $url =~ m|://([^/]*)/| ) { next if ( $1 !~ $owned_sites ); } # Look for mailto: links. next if ( $url =~ m|^[^/]*:| ); my ( $path, $params ) = split /\?/, $url, 2; my %params = map { split( /=/, $_, 2 ) } split( /&/, $params ) +; $params{session} ||= $session; $url = join( '?', $path, join( '&', map { "$_=$params{$_}" } k +eys( %params ) ) ); $link->attr('href', $url); } my $html = $root->as_HTML; $root->delete(); return $html; }

Now, I just know somebody is going to tell me that I should be using URI::URL and that my session info is not going to be escaped, etc... But lets just consider that an excercise for another day. The point here is mainly to provide an example where HTML::TreeBuilder saves the day.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://269365]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2024-04-25 16:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found