in reply to Extracting Text After <pre> tag in HTML

Just for the sake of completeness here's how you might do it with HTML::Parser:

use HTML::Parser; my $VAR1 = '<html><title>GAL7</title> <body bgcolor=white> <h2 align=center>GAL7</h2><hr> <form method="post" action="/cgi-bin/SCPD/getgene2?GAL7" enctype="appl +ication/x-www-form-urlencoded"> <input type="submit" name="action" value="Get mapped sites" /><input t +ype="submit" name="action" value="Get putative sites" /><input type=" +submit" name="action" value="Get interg enic region" /><br /><input type="submit" name="action" value="Retriev +e sequence" />Start<-ATG <input type="text" name="start" value="-450" + size="5" maxlength="5" />ATG->End <inp ut type="text" name="end" value="50" size="5" maxlength="5" /><div></d +iv></form><hr> <pre> >YBR018C GAL7 275433 275933 TTTGATATCACTCACAACTATTGCGAAGCGCTTCAGTGAAAAAATCATAA GGAAAAGTTGTAAATATTATTGGTAGTATTCGTTTGGTAAAGTAGAGGGG GTAATTTTTCCCCTTTATTTTGTTCATACATTCTTAAATTGCTTTGCCTC TCCTTTTGGAAAGCTATACTTCGGAGCACTGTTGAGCGAAGGCTCATTAG ATATATTTTCTGTCATTTTCCTTAACCCAAAAATAAGGGAAAGGGTCCAA AAAGCGCTCGGACAACTGTTGACCGTGATCCGAAGGACTGGCTATACAGT GTTCACAAAATAGCCAAGCTGAAAATAATGTGTAGCTATGTTCAGTTAGT TTGGCTAGCAAAGATATAAAAGCAGGTCGGAAATATTTATGGGCATTATT ATGCAGAGCATCAACATGATAAAAAAAAACAGTTGAATATTCCCTCAAAA ATGACTGCTGAAGAATTTGATTTTTCTAGCCATTCCCATAGACGTTACAA </pre>Some other stuff</body></html>'; sub default_start { my ($self, $tagname) = @_; if ( $tagname eq 'pre' ) { $self->handler(text => \&get_text, "self,dtext"); $self->handler(end => \&end_text, "self,tagname"); } } sub get_text { my ($self, $text) = @_; if ( not exists $self->{_text} ) { $self->{_text} = $text; } else { $self->{_text} .= $text; } } sub end_text { my ( $self, $tagname) = @_; if ( $tagname eq 'pre' ) { $self->handler(text => ''); $self->handler(start => ''); $self->handler(end => ''); } } my $parser = HTML::Parser->new(start_h => [\&default_start,'self,tagna +me']); $parser->parse($VAR1); print $parser->{_text};
This might have the advantage over using other parsers if you are dealing with large documents as it doesn't build a preparsed representation of the documentation before handing the events to you.


Replies are listed 'Best First'.
Re^2: Extracting Text After <pre> tag in HTML
by Anonymous Monk on Mar 30, 2007 at 08:41 UTC
    how to work
     PRE tag within text area