Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Link Parser, something to be desired?

by Your Mother (Archbishop)
on May 30, 2009 at 06:13 UTC ( [id://767012]=note: print w/replies, xml ) Need Help??


in reply to Link Parser, something to be desired?

XML::LibXML example. Terse and robust. Don't reach for regexes for HTML unless it's an instant one-off you'll be able to verify by eye. Parsers require little more effort and are much more reliable.

use strict; use warnings; use XML::LibXML; my $parser = XML::LibXML->new(); $parser->keep_blanks(1); $parser->recover_silently(1); # There are other parse methods: string, fh. my $dom = $parser->parse_html_file(shift || die "give a file\n"); $_->setAttribute("href", "#") for $dom->findnodes('//a[@href]'); print $dom->serialize(1);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://767012]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-04-25 20:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found