Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Link Parser, something to be desired?

by ig (Vicar)
on May 29, 2009 at 23:17 UTC ( [id://766982]=note: print w/replies, xml ) Need Help??


in reply to Link Parser, something to be desired?

Escapes within double quotes can be tricky. I suspect what you want is one of the following:

$regex = "href\\s*=\\s*\".*?\""; $regex = 'href\s*=\s*".*?"';

It might help you to print $regex and $match, to see what you are getting.

update: You can use $match in the substitution (at least it worked fine in my tests) but it won't do what you expect if $regex isn't what you expect. Also, since $match is one particular match, adding the global qualifier to the substitution using $match isn't likely to replace multiple instances.

You have indicated that you have more to do than just substitute the href values in the anchors. Without knowing more about what you are doing it is hard to make an appropriate recommendation but something like the following might be closer to what you want:

use strict; use warnings qw( FATAL all ); my $regex = 'href\s*=\s*".*?"'; print "regex = $regex\n"; foreach my $line (<DATA>) { chomp($line); while ($line =~ /$regex/) { my $match = $&; $line =~ s/$match/xxx/; } } __DATA__ No anchors here <a href="asdf">go there</a> <a href="fdsa">go here</a>

Replies are listed 'Best First'.
Re^2: Link Parser, something to be desired?
by ikegami (Patriarch) on May 29, 2009 at 23:32 UTC

    Escapes within double quotes can be tricky.

    qr// makes it easy.

    $regex = qr/href\s*=\s*".*?"/;

    Also,
    while ($line =~ /$regex/) { my $match = $&; ... }
    can be better written as
    while (my ($match) = $line =~ /($regex)/) { ... }
    since it avoids globals and $&, which slows down matches throughout your program.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://766982]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 01:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found