Well, first, that huge post really needed a "readmore" wrapping it all. You stuck a "readmore" in the middle of the URL's, but never closed it. IMHO, the code as well as the URL list should have been wrapped.

Second, are you sure it isn't working? I downloaded your list and got the following

$ grep "mailto" pm.txt | wc 2 2 65 $ grep "^#" pm.txt | wc 50 50 367 $ wc out1.txt 333 333 18647 out1.txt $ wc pm.txt 385 385 18779 pm.txt

where "pm.txt" is your list (removing the "readme") and "out1.txt" is the result of running your code in the following form:

Listing: clean1.pl

#!/usr/bin/perl -w $url="www.page.com/test/index.html"; $base="http://www.page.com/test/"; open(INFILE,"<pm.txt") or die; @search = <INFILE>; chomp for @search; foreach(@search) { if ($_ !~ /^http:\/\//gi) { if ($_ !~ /^#/g) { if ($_ !~ /mailto:/gi) { my $force_url = "$base$_"; push(@search_ready, "$force_url"); } } } else { if ($_ =~ /^\#/g) { my $force_url = join("", $url, $_); #print "$force_url<br>"; push(@search_ready, "$force_url"); } else { #print "$_<br>"; push(@search_ready, "$_"); } } } print "$_\n" for @search_ready;

So by my count, it looks like it's doing the right thing. However, I've rewritten it here in a clearer form.

Listing: clean2.pl

#!/usr/bin/perl -w $url="www.page.com/test/index.html"; $base="http://www.page.com/test/"; open(INFILE,"<pm.txt") or die; @search = <INFILE>; chomp for @search; foreach(@search) { next if /^#/; next if /^mailto/i; if ( $_ =~ m|^http://|i ) { push(@search_ready, "$_"); } else { push(@search_ready, "$base$_"); } } print "$_\n" for @search_ready;

Note that I've removed the "/^\#/" logic entirely as it wasn't being used at all. A diff of the output of both is the same, but the second version is much clearer as to what's happening.

If you still think it isn't working for you, more details are needed.

-xdg

Code posted by xdg on PerlMonks is public domain. It has no warranties, express or implied. Posted code may not have been tested. Use at your own risk.


In reply to Re: link parsing by xdg
in thread link parsing by coldfingertips

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.