Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Link Parser, something to be desired?

by zwon (Abbot)
on May 29, 2009 at 22:50 UTC ( [id://766979]=note: print w/replies, xml ) Need Help??


in reply to Link Parser, something to be desired?

Hi!

It seems doesn't work.

my $regex = "href\s*=\s*\"?.*\"";

I think it should be

my $regex = qr{href\s*=\s*".*?"};
Note qr// and question mark position. So perhaps your code should look something like this:
use strict; use warnings; # this is also useful my $regex = qr{href\s*=\s*".*?"}; my $sub = "href=\"#\""; while(<DATA>){ s/$regex/$sub/g; print; } __DATA__ afjalsdfj href="asfasdfa" afdsas href="akjshfakjsd" href = "ajsfhaklj"

Update: and here's another approach using HTML::Parser:

use strict; use warnings; use HTML::Parser; use HTML::Entities; my $html = join '', <DATA>; my $p = HTML::Parser->new( default_h => [ sub { print shift }, 'text' ], comment_h => [""], start_h => [ \&start, 'tag,attr,attrseq,text' ] ); $p->parse($html); sub start { my ( $tag, $attr, $attrseq, $text ) = @_; unless ( exists $attr->{href} ) { print $text; } else { $attr->{href} = "#"; print "<$tag"; print " $_=\"", encode_entities( $attr->{$_} ), '"' for (@$att +rseq); print ">"; } } __DATA__ <html><body> <h1>title</h1> <a href="foo" class="class">link</a> <p>some <a href="link">text</a> and <a href="another_link">more</a> </body></html>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://766979]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-18 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found