developer_p has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks, I need to parse a web file such as http://www.abc.com/place/abc.cgi I need to get data from this file using PERL. Any help or pointers on this will be of great help. The link also requires user name name and password. Update: I have a link on the web: http://wwwin-dcb.abc.com/Travel/dcq.cgi? It contains following details in a table format. name place ID Cost I need to pick up name and ID using a PERL script. Code Used: #!/usr/bin/perl -w use strict; use LWP::Simple; use HTML::TreeBuilder; my $url = 'http://www.cgidir.com/tutorials/Advanced/tutorials/CGIOUTPU +T.html'; my $page = get($url) or die $!; my $p = HTML::TreeBuilder->new_from_content( $page ); my @links = $p->look_down( _tag => 'tr' ); my %acct; my $count =0; for my $row (@links) { my @cells = $row->look_down( _tag => 'tr' ); my @rows = map { $_->as_trimmed_text( ) } @cells ; $count = $count +1; # print @rows; $acct{title} = $rows[0]; $acct{1} = $rows[1]; $acct{2} = $rows[2]; $acct{3} = $rows[3]; $acct{4} = $rows[4]; print "\n"; print $count; print "\n"; # print "$acct{title}\t"; print "$acct{1}\t"; print "$acct{2}\t"; print "$acct{3}\t"; print "$acct{4}\t"; } $p = $p->delete; # don't need it anymore This works fine for an .html file as given above but not for the link +i want i.e http://wwwin-dcb.abc.com/Travel/dcq.cgi? Any help on this is most welcome. Thanks in Advance P

Replies are listed 'Best First'.
Re: Parsing an HTML page using a PERL script
by moritz (Cardinal) on Mar 03, 2009 at 12:23 UTC
    Look at the CPAN, where you can find many modules that can make your life easier. Particularly LWP::Simple and search for a html parser, or look at WW::Mechanize.

    Btw the language is spelled "Perl", the compiler "perl". The only PERL I know of is provided by Inline::PERL.

Re: Parsing an HTML page using a PERL script
by repellent (Priest) on Mar 03, 2009 at 17:00 UTC
Re: Parsing an HTML page using a PERL script
by leocharre (Priest) on Mar 03, 2009 at 17:01 UTC

    What do you mean.. "parse" a "web file"? What's a web file?

    Is this data retrieved from a remote location, saved to a local storage system? Does the file contain html? If so- maybe you want to parse the html tree.

    Maybe you're just raking the woods for information. Maybe you just need to wget the stuff and grep out the content... ?

    For example.. retrieve the data and grep out lines with some links..

    $ wget http://perlmonks.org/?parent=747704;node_id=3333 -O - | grep 'a href='
      Can any one throw some light on the updated code?
        That code relies on non-existant url http://wwwin-dcb.abc.com/Travel/dcq.cgi? ... maybe you want HTML::SimpleLinkExtor