I am working on a parser to grab all the unsubcribe links from a big text file. The text file is a mix of plain text and HTML. I am able to use HTML:LinkExtor to grab most of the links, however, at this point it returns 'a href's and img src's' I'm only interested in the 'a href's' and once I have these, I would like to narrow them down with a regex.
As of now it looks like this:
I plan to uncomment the regex portion when I get better results.#!/usr/bin/perl use HTML::LinkExtor; use URI::URL; $p = HTML::LinkExtor->new(\&cb, "http://www.x10.com"); sub cb { my($tag, %links) = @_; print "$tag @{[%links]}\n"; } $p->parse_file("rfl.txt"); #@glob = $p; #for($i=0; $i<@glob; $i++){ # $_ = @glob[$i]; # if(/account.cgi/){ # $counter = 1 - $counter; # print $_ ; # } #}
I know there are a lot of errors, and I appreciate any guidance. Incidently, I can't use strict, because I get these errors when I do.
So my main objectives are to remove any 'img src' references, and make sure that all the URL's are stored properly in an array which I can parse further.Global symbol "$p" requires explicit package name at link.pl line 9. Global symbol "$p" requires explicit package name at link.pl line 14. Execution of link.pl aborted due to compilation errors.
Here is the top portion of my current results. I also noticed that some of the URL's are not returned or incomplete.
I appreciate any help you can give.a href http://www.x10.com/3D%22http://hop.clickbank.net/?aaso2/intelli +%22 a href http://www.x10.com/3D%22http://www.consumerinfo.com/home_pca.as +p?sc=3D141 = a href http://www.x10.com/3D%22http://hop.clickbank.net/?aaso2/webpd%2 +2 a href http://www.x10.com/3D%22http://www.x10.com/xcam2_allspecial33.h +tm%22 a href http://www.x10.com/3D%22http://www.teamnova.com/encore/combo.cf +m?siteid=3 D= a href http://www.x10.com/3D%22http://hop.clickbank.net/?aaso2/intel= a href http://www.x10.com/3D%22http://hop.clickbank.net/?aaso2/intel= img src http://www.x10.com img src http://www.x10.com a href http://www.x10.com/jecn@allaboutspe= img src http://www.x10.com img src http://www.x10.com img src http://www.x10.com a href http://www.x10.com/3D%22http://www.consumerinfo.com/home_pca.as +p?sc=3D14= img src http://www.x10.com/= img src http://www.x10.com
Bests,
amearse
In reply to Jiggy w/ LinkExtor by amearse
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |