in reply to Re: Extracting data-structure from HTML using Web::Scraper
in thread Extracting data-structure from HTML using Web::Scraper
Same with xsh
The output
$ xsh --html --quiet --non-interactive --load pm981742.xsh <?xml version="1.0" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http:// +www.w3.org/TR/REC-html40/loose.dtd"> <html> <body> <h4 class="bla">July 12</h4> <p>Tim</p> <p>Jon</p> <h4 class="bla">July 13</h4> <p>James</p> <p>Eric</p> <p>Jerry</p> <p>Susie</p> <h4 class="date">July 14</h4> <p>Kami</p> <p>Darryl</p> </body> </html> { "July 12" => ["Tim", "Jon"], "July 13" => ["James", "Eric", "Jerry", "Susie"], "July 14" => ["Kami", "Darryl"], }
The xsh script (xml shell script)
open pm981742.xml; ls --indent /; for //body/* { $text = string(text()); if( name() = "h4" ){ $key = $text; } if( name() = "p" ){ perl { push @{ $hash{$key} }, $text; }; } } perl { use Data::Dump; dd \%hash; undef %hash; undef $key; };
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Extracting data-structure from HTML using Web::Scraper
by Anonymous Monk on Jul 14, 2012 at 07:58 UTC |