Re: Pretty cool link extractor.

A couple of points. There are many. I'll address a few. The regex that removes tags will fail on certain inputs. Parsers are made for a reason.

Most links are in tags which you are throwing away.

What if the line contains http://yahoo.com stinks?
"http://yahoo.com stinks" is not a url.

You could combine all three push statements into one.
push @line_array,$_ if (/^(http|ftp|mailto):/);

I personally don't see anything wrong with trying to reinvent wheels, you can learn alot. But you should study the wheel and see what it does and what you can do better.

-Lee

"To be civilized is to deny one's nature."

Comment on Re: Pretty cool link extractor.

Replies are listed 'Best First'.
Re: Re: Pretty cool link extractor. by Util (Priest) on Mar 26, 2002 at 18:05 UTC
>I personally don't see anything wrong with trying to >reinvent wheels, you can learn alot. But you should study >the wheel and see what it does and what you can do better. Well said! In that spirit, I offer a different cool link extractor: `perl -MHTML::LinkExtor -e 'print qq{@$_\n} foreach HTML::LinkExtor->new->parse_file($ARGV[0])->links'` What's cool is not that it is a one-liner, but that it is usable as a fast "tool" in my editor. While viewing a page in my web browser (Opera), I hit a command key to view source (in UltraEdit), another command key to extract links, and I have all the links from that page in a unnamed buffer. I use this every day. Bruce Gray	[reply] [d/l]

Replies are listed 'Best First'.

Re: Re: Pretty cool link extractor.
by Util (Priest) on Mar 26, 2002 at 18:05 UTC

>I personally don't see anything wrong with trying to
>reinvent wheels, you can learn alot. But you should study
>the wheel and see what it does and what you can do better.

Well said!
In that spirit, I offer a different cool link extractor:
perl -MHTML::LinkExtor -e 'print qq{@$_\n} foreach HTML::LinkExtor->new->parse_file($ARGV[0])->links'
What's cool is not that it is a one-liner, but that it is usable as a fast "tool" in my editor. While viewing a page in my web browser (Opera), I hit a command key to view source (in UltraEdit), another command key to extract links, and I have all the links from that page in a unnamed buffer. I use this every day.

Bruce Gray

[reply]
[d/l]