mm&mm has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I am using the HTML::Builder module to parse an html file and write the output to a file:

my $tree = HTML::TreeBuilder->new(); $tree->parse_file($daactualfile); $tree->objectify_text(); open OUT,"> $output_temp_file" or die "cannot open $output_temp_file $ +!\n"; $tree->dump(*OUT); close OUT;
Then I open the file and read the contents into an array...

Is there a simpler way to do this? Something like dumping directly to an array?

Replies are listed 'Best First'.
Re: HTML::TreeBuilder. Redirect tree->dump to an array
by Corion (Patriarch) on May 05, 2008 at 10:04 UTC

    Maybe ->as_HTML does what you want? That method is mentioned immediately after ->dump in the HTML::Element documentation.

    As you haven't told us what you want to achieve, or what ->dump actually does, all I can do is guess...

      Yes,
      dump() is a method of the HTML::Element module, which is required by HTML::TreeBuilder.
      I quote from CPAN:

      $h->dump()
      $h->dump(*FH) ; # or *FH{IO} or $fh_obj
      Prints the element and all its children to STDOUT (or to a specified filehandle), in a format useful only for debugging. The structure of the document is shown by indentation (no end tags).

      This is a sample of what the output of tree-> dump looks like:
      <html> @0 <head> @0.0 <title> @0.0.0 <~text text="WO trail BARCELON/GAR - GERONA/CAR - 30N - 1"> @0.0 +.0.0 <meta content="text/html; charset=iso-8859-1" http-equiv="Content- +Type"> @0.0.1 <meta content="MSHTML 6.00.2800.1589" name="GENERATOR"> @0.0.2 <meta content="Mozilla/4.6 [fr] (WinNT; I) [Netscape]" name="GENER +ATOR"> @0.0.3 <body alink="#ff0000" bgcolor="#ffffff" link="#000080" text="#000000 +" vlink="#800080"> @0.1 <a name="HEADER"> @0.1.0 <center> @0.1.1 <h1> @0.1.1.0 <font size="+2"> @0.1.1.0.0 <~text text="WORK ORDER"> @0.1.1.0.0.0 <b> @0.1.6 <font size="-2"> @0.1.6.0 <~text text="trail"> @0.1.6.0.0 <table bgcolor="#66ffff" border=0 cellspacing=0 cols=8 width="100% +"> @0.1.7 <tbody> @0.1.7.0 <tr> @0.1.7.0.0 <td> @0.1.7.0.0.0 <font size="-2"> @0.1.7.0.0.0.0 <~text text="Node A"> @0.1.7.0.0.0.0.0 <td> @0.1.7.0.0.1 <font size="-2"> @0.1.7.0.0.1.0 <~text text="Node Z"> @0.1.7.0.0.1.0.0
      In my code, this is what's written to an output file, and what I would rather have assigned to an array instead.
Re: HTML::TreeBuilder. Redirect tree->dump to an array
by tachyon-II (Chaplain) on May 05, 2008 at 10:53 UTC

    It is unclear what you are trying to do but making the assumption that the dump method will only write to a file handle and would like to avoid a temp file just use an IO::String object where you have *OUT. Dump will then happily dump into this. This is what IO::String was designed for.

    use IO::String; my $io = IO::String->new; $tree->dump($io); $io->setpos(0); @lines = <$io>; # dump output in an array

    About the only gotcha with IO::String objects is that if you write to one you end up at EOF (end of string) so in order to read all the data in you need to seek back to the begining (pos, setpos and seek all work). If you forget to do this it appears there is no data.....

      Yes!

      This is exactly what I needed:

      "making the assumption that the dump method will only write to a file handle and would like to avoid a temp file just use an IO::String object where you have *OUT.

      I am sorry I didn't make myself clear from the beginning.

      Thanks!