Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Extracting Text After <pre> tag in HTML

by GrandFather (Saint)
on Sep 22, 2006 at 01:28 UTC ( [id://574285]=note: print w/replies, xml ) Need Help??


in reply to Extracting Text After <pre> tag in HTML

Use HTML::TreeBuilder:

use strict; use warnings; use HTML::TreeBuilder; my $str = '<html><title>GAL7</title> <body bgcolor=white> <h2 align=center>GAL7</h2> <hr> <form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="appl +ication/x-www-form-urlencoded"> <input type="submit" name="action" value="Get mapped sites" /> <input type="submit" name="action" value="Get putative sites" /> <input type="submit" name="action" value="Get interg enic region" /><br /> <input type="submit" name="action" value="Retrieve sequence" />Start<- +ATG <input type="text" name="start" value="-450" size="5" maxlength="5" /> +ATG->End <input type="text" name="end" value="50" size="5" maxlength="5" /> <div></div></form> <hr> <pre> >YBR018C GAL7 275433 275933 TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA </pre>'; my $tree = HTML::TreeBuilder->new; $tree->parse ($str); print $_->as_text () . "\n" for $tree->find ('pre');

Prints:

>YBR018C GAL7 275433 275933 TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA

Update: Fixed link


DWIM is Perl's answer to Gödel

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://574285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-19 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found