sugarkannan has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I've jus been initiated into the monkhood !
For the past couple of days, I've been learning PERL n LWP.
I've been working on a simple LWP problem wherein I'll have to get the URLs from a given HTML page !

I've found a solution wherein I traverse the whole downloaded HTML page, character by character,
searching for certain pattern. Is there any other way to get the URLs in a simple way,
instead of searching for the pattern character by character?

while(1) { # pick up start position $position= index($content,'something.htm?',$temp_position); print"\n\n"; $temp_position=$position; # pick up till quote position while($arr[$temp_position] ne '"' ) { $page .= $arr[$temp_position]; $temp_position++; } print "\n\n"; # Printing URL print "Page: $page"; # Writing URL to a file print TXT "$page\n"; #RE-Initialization $page=$home; print" \n\nPosition: $position \n"; print"Temp_position: $temp_position \n"; if($reverse == $position) { last } }

Regards.
AK

Edit: g0n - code tags

Replies are listed 'Best First'.
Re: LWP beginner
by marto (Cardinal) on Oct 25, 2005 at 14:02 UTC
    Hi sugarkannan,

    Firstly, welcome! Have you read the PerlMonks FAQ?
    Secondly, do you have to use LWP for this?
    The WWW::Mechanize module has a method called $mech->links which lists all the links on the page your looking at.
    The module documentation has good examples of how to go about using it.

    If you have to use LWP have a look at the HTML::LinkExtractor module.
    "HTML::LinkExtractor - Extract links from an HTML document"

    Hope this helps

    Martin
Re: LWP beginner
by blazar (Canon) on Oct 25, 2005 at 14:06 UTC
    I've found a solution wherein I traverse the whole downloaded HTML page, character by character, searching for certain pattern. Is there any other way to get the URLs in a simple way, instead of searching for the pattern character by character?
    Yes! At the very least you may try a regex, but it wouldn't be a good solution either, and it is generally recommended to avoid it. You'll find that
    perldoc -q URLs
    lists some alternatives. In particular I've used advantageously HTML::LinkExtor in the past. And it was quite easy, even more than I would have expected...
Re: LWP beginner
by puff (Beadle) on Oct 26, 2005 at 02:16 UTC
    There is a good deal of information online and some interesting examples at O'Reilly from two books I found very helpful "Spidering Hacks" and "Perl & LWP". They have saved me a great deal of time.