Well, first, that huge post really needed a "readmore" wrapping it all. You stuck a "readmore" in the middle of the URL's, but never closed it. IMHO, the code as well as the URL list should have been wrapped.
Second, are you sure it isn't working? I downloaded your list and got the following
$ grep "mailto" pm.txt | wc 2 2 65 $ grep "^#" pm.txt | wc 50 50 367 $ wc out1.txt 333 333 18647 out1.txt $ wc pm.txt 385 385 18779 pm.txt
where "pm.txt" is your list (removing the "readme") and "out1.txt" is the result of running your code in the following form:
Listing: clean1.pl
#!/usr/bin/perl -w $url="www.page.com/test/index.html"; $base="http://www.page.com/test/"; open(INFILE,"<pm.txt") or die; @search = <INFILE>; chomp for @search; foreach(@search) { if ($_ !~ /^http:\/\//gi) { if ($_ !~ /^#/g) { if ($_ !~ /mailto:/gi) { my $force_url = "$base$_"; push(@search_ready, "$force_url"); } } } else { if ($_ =~ /^\#/g) { my $force_url = join("", $url, $_); #print "$force_url<br>"; push(@search_ready, "$force_url"); } else { #print "$_<br>"; push(@search_ready, "$_"); } } } print "$_\n" for @search_ready;
So by my count, it looks like it's doing the right thing. However, I've rewritten it here in a clearer form.
Listing: clean2.pl
#!/usr/bin/perl -w $url="www.page.com/test/index.html"; $base="http://www.page.com/test/"; open(INFILE,"<pm.txt") or die; @search = <INFILE>; chomp for @search; foreach(@search) { next if /^#/; next if /^mailto/i; if ( $_ =~ m|^http://|i ) { push(@search_ready, "$_"); } else { push(@search_ready, "$base$_"); } } print "$_\n" for @search_ready;
Note that I've removed the "/^\#/" logic entirely as it wasn't being used at all. A diff of the output of both is the same, but the second version is much clearer as to what's happening.
If you still think it isn't working for you, more details are needed.
-xdg
Code posted by xdg on PerlMonks is public domain. It has no warranties, express or implied. Posted code may not have been tested. Use at your own risk.
In reply to Re: link parsing
by xdg
in thread link parsing
by coldfingertips
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |