Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

parsing html link

by paola82 (Sexton)
on May 25, 2009 at 09:56 UTC ( [id://765995]=perlquestion: print w/replies, xml ) Need Help??

paola82 has asked for the wisdom of the Perl Monks concerning the following question:

Hi dear monks. I'm tryng to improve my script as automatically surfing for searching my bio data :-).... That is my question: I have my web page and I only want to extract the link of my ligand...They are two for this case and maybe more or less for other cases (other proteins). In this case I need only to print the link of the "EPE" ligand...but it seems that my "if" cicle doesn't work...I'll paste my code below

#!/usr/bin/perl use warnings; use strict; use LWP::Simple; print "ELENCO LIGANDI\n"; my $url3 = "http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum +/GetPage.pl?pdbcode=2j6p&template=ligands.html&l=1.1"; my $content =get ($url3); use HTML::TreeBuilder; my $p = HTML::TreeBuilder->new; $p->parse_content($content); my @a = $p->look_down(_tag => q{a}); for my $a (@a){ my $txt = $a->as_text; if ($txt=~ /EPE\s/){ print $txt, qq{\n}; use Web::Scraper; use Data::Dumper; # Invoked for a <a> tag my $link = scraper { process '//a' => 'href' => '@href'; process '//a' => 'description' => 'TEXT'; }; my $page = scraper { process '//a[@href]' => 'links[]' => $link; process '//meta[@http-equiv]' => 'meta[]' => '@content'; process '//area[@href]' => 'areas[]' => '@href'; }; my $info = $page->scrape($content); print Dumper $info; } } $p->delete;

thanks in advance....I'm apologizing for disturbing all of you

Replies are listed 'Best First'.
Re: parsing html link
by wfsp (Abbot) on May 25, 2009 at 10:21 UTC
    You were nearly there (if I've understood your question correctly)
    #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TreeBuilder; print "ELENCO LIGANDI\n"; my $url3 = "http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum +/GetPage.pl?pdbcode=2j6p&template=ligands.html&l=1.1"; my $content =get ($url3); my $p = HTML::TreeBuilder->new; $p->parse_content($content); my @anchors = $p->look_down(_tag => q{a}); for my $anchor (@anchors){ my $txt = $anchor->as_text; if ($txt=~ /EPE\s/){ print $txt, qq{\n}; my $href = $anchor->attr(q{href}); print $href, qq{\n}; } } $p->delete; __DATA__ output: ELENCO LIGANDI EPE 1148(C) /thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=2j6p&templat +e=ligands.html&l=4.1 EPE 1148(D) /thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=2j6p&templat +e=ligands.html&l=4.2
    update: changed var names to anchor/s

      thanks you solved my question, is it enough if I put there solved and paste your code below???For beginner like me, I suggest to read the previews posts

      #!/usr/bin/perl use warnings; use strict; use LWP::Simple; use HTML::TreeBuilder; print "ELENCO LIGANDI\n"; my $url3 = "http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum +/GetPage.pl?pdbcode=2j6p&template=ligands.html&l=1.1"; my $content =get ($url3); my $p = HTML::TreeBuilder->new; $p->parse_content($content); my @anchors = $p->look_down(_tag => q{a}); for my $anchor (@anchors){ my $txt = $anchor->as_text; if ($txt=~ /EPE\s/){ print $txt, qq{\n}; my $href = $anchor->attr(q{href}); print $href, qq{\n}; } }
Re: parsing html link
by poolpi (Hermit) on May 25, 2009 at 10:24 UTC

    You might try something like that:

    #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $start = "http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl +?pdbcode=2j6p&template=ligands.html&l=1.1"; my $m = WWW::Mechanize->new( autocheck => 1 ); $m->get($start); my @links = $m->find_all_links(); print "No link\n" unless @links; for my $link (@links) { next unless $link->text; if ( $link->text =~ /EPE/msx ) { print $link->text, "\n"; } } # Output: # EPE 1148(C) # EPE 1148(D) # EPE-EPE


    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://765995]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (5)
As of 2024-03-28 19:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found