comment on

Just for the sake of completeness here's how you might do it with HTML::Parser:

 
use HTML::Parser;
 
my $VAR1 = '<html><title>GAL7</title>
<body bgcolor=white>
<h2 align=center>GAL7</h2><hr>
<form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="appl
+ication/x-www-form-urlencoded">
<input type="submit" name="action" value="Get mapped sites" /><input t
+ype="submit" name="action" value="Get putative sites" /><input type="
+submit" name="action" value="Get interg
enic region" /><br /><input type="submit" name="action" value="Retriev
+e sequence" />Start<-ATG <input type="text" name="start" value="-450"
+ size="5" maxlength="5" />ATG->End <inp
ut type="text" name="end" value="50" size="5" maxlength="5" /><div></d
+iv></form><hr>
<pre>
>YBR018C  GAL7  275433  275933
TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA
GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG
GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC
TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG
ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA
AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT
GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT
TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT
ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA
ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA
</pre>Some other stuff</body></html>';
 
sub default_start
{
   my ($self, $tagname) = @_;
 
   if ( $tagname eq 'pre' )
   {
     $self->handler(text => \&get_text, "self,dtext");
     $self->handler(end  => \&end_text, "self,tagname");
   }
}
 
sub get_text
{
   my ($self, $text) = @_;
   if ( not exists $self->{_text} )
   {
     $self->{_text} = $text;
   }
   else
   {
     $self->{_text} .= $text;
   }
}
 
sub end_text
{
   my ( $self, $tagname) = @_;
 
   if ( $tagname eq 'pre' )
   {
     $self->handler(text => '');
     $self->handler(start => '');
     $self->handler(end => '');
   }
}
 
my $parser = HTML::Parser->new(start_h => [\&default_start,'self,tagna
+me']);
 
$parser->parse($VAR1);
print $parser->{_text};
[download]

This might have the advantage over using other parsers if you are dealing with large documents as it doesn't build a preparsed representation of the documentation before handing the events to you.

/J\

In reply to Re: Extracting Text After <pre> tag in HTML by gellyfish
in thread Extracting Text After <pre> tag in HTML by monkfan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.