Use HTML::TreeBuilder:
use strict;
use warnings;
use HTML::TreeBuilder;
my $str = '<html><title>GAL7</title>
<body bgcolor=white>
<h2 align=center>GAL7</h2>
<hr>
<form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="appl
+ication/x-www-form-urlencoded">
<input type="submit" name="action" value="Get mapped sites" />
<input type="submit" name="action" value="Get putative sites" />
<input type="submit" name="action" value="Get interg
enic region" /><br />
<input type="submit" name="action" value="Retrieve sequence" />Start<-
+ATG
<input type="text" name="start" value="-450" size="5" maxlength="5" />
+ATG->End
<input type="text" name="end" value="50" size="5" maxlength="5" />
<div></div></form>
<hr>
<pre>
>YBR018C GAL7 275433 275933
TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA
GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG
GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC
TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG
ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA
AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT
GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT
TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT
ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA
ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA
</pre>';
my $tree = HTML::TreeBuilder->new;
$tree->parse ($str);
print $_->as_text () . "\n" for $tree->find ('pre');
Prints:
>YBR018C GAL7 275433 275933
TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA
GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG
GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC
TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG
ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA
AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT
GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT
TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT
ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA
ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA
Update: Fixed link
DWIM is Perl's answer to Gödel
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|