note
gellyfish
<p>
Just for the sake of completeness here's how you might do it with [module://HTML::Parser]:
<code>
use HTML::Parser;
my $VAR1 = '<html><title>GAL7</title>
<body bgcolor=white>
<h2 align=center>GAL7</h2><hr>
<form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="application/x-www-form-urlencoded">
<input type="submit" name="action" value="Get mapped sites" /><input type="submit" name="action" value="Get putative sites" /><input type="submit" name="action" value="Get interg
enic region" /><br /><input type="submit" name="action" value="Retrieve sequence" />Start<-ATG <input type="text" name="start" value="-450" size="5" maxlength="5" />ATG->End <inp
ut type="text" name="end" value="50" size="5" maxlength="5" /><div></div></form><hr>
<pre>
>YBR018C GAL7 275433 275933
TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA
GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG
GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC
TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG
ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA
AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT
GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT
TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT
ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA
ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA
</pre>Some other stuff</body></html>';
sub default_start
{
my ($self, $tagname) = @_;
if ( $tagname eq 'pre' )
{
$self->handler(text => \&get_text, "self,dtext");
$self->handler(end => \&end_text, "self,tagname");
}
}
sub get_text
{
my ($self, $text) = @_;
if ( not exists $self->{_text} )
{
$self->{_text} = $text;
}
else
{
$self->{_text} .= $text;
}
}
sub end_text
{
my ( $self, $tagname) = @_;
if ( $tagname eq 'pre' )
{
$self->handler(text => '');
$self->handler(start => '');
$self->handler(end => '');
}
}
my $parser = HTML::Parser->new(start_h => [\&default_start,'self,tagname']);
$parser->parse($VAR1);
print $parser->{_text};
</code>
This might have the advantage over using other parsers if you are dealing with large documents as it doesn't build a preparsed representation of the documentation before handing the events to you.
</p>
<p>/J\</p>
574282
574282