First, please, instead of including your entire data set, whittle it down to an interesting subset.
Second, time for some Perl cleanup. Perhaps along the way we'll find your bug.
foreach(@search) { # We'll put the cleaned-up URL here temporarily. # We could do a push directly, but this will make # debugging easier. # my $ready; # You said you wanted to throw out mailtos # /^mailto:/ and next; # "if ($_ !~ /^http:\/\//gi)" # # /g is meaningless here; the caret can only match once. # "$_ !~ /.../" is a long way of saying "!/.../". # Pick your quotes to avoid Leaning Toothpick Syndrome. # And are you sure you want to start your if-else with # a negative test? # if (!m[^http://]i) { # "if ($_ !~ /^#/g)" # # Same as above # if (!/^#/) { $ready = "$base$_"; } } else # m[^http://] { # "if ($_ =~ /^\#/g)" # # But the first character can't be "#" because # at this point we know it's "h"! # Shorten; /g useless # if (/^#/) { $ready = "$url$_"; } else { $ready = $_; } } if (not defined $ready) { print "$_ was lost!<br>\n"; next; } print "$_ becomes $ready<br>\n"; # For debugging push @search_ready, $ready; }
Looks like we found the bug. I'm not quite sure what you're going for with the #'s, so I haven't tried to fix it. I highly suggest using positive tests in your if-elses. It tends to be less confusing.
In reply to Re: link parsing
by TilRMan
in thread link parsing
by coldfingertips
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |