Enlightened Ones and other Seekers of Widsom,

For my wife, who is a software translator, I am trying to achieve the following:

I have a glossary in HTML, implemented as a definition list. After translation, the glossary naturally needs to be re-sorted.
I already wrote a solution with Regular Expressions but with HTML being hard to parse, it is not very efficient so far... Thus I'd like to use HTML::TreeBuilder

It's quite easy when the glossary was a two column table (check my scratchpad if you are interested: svenXY's scratchpad), but with a definition list, the problem is that the <dt> and the <dd> tag are independent of each other. I can well sort the dt tag, but how do I at the same time sort the dd tag with it?


I have a solution here, but I don't really like it. I'm sure there are better ways to do it
#!/usr/bin/perl -w use strict; use HTML::TreeBuilder; use HTML::PrettyPrinter; use Data::Dumper; my $html_code = ' <html> <head> <title>Glossary</title> <h1>Glossary</h1> <dl> <dt><b>E Definition</b></dt> <dd>E - data</dd> <p></p> <dt><b>B Definition</b></dt> <dd>B - data</dd> <p></p> <dt><b>A_definition</b></dt> <dd>A data.</dd> <p></p> <dt><b>C definition</b></dt> <dd>C - data</dd> <p></p> </dl> </body> </html> '; my %glossar; my $tree = HTML::TreeBuilder->new; $tree->parse($html_code); my ($dl) = $tree->look_down('_tag', 'dl'); my %data; # looping trough the dt tags, # spawning a hash with the text of dt as key # and the HTML of dt and dd as values for my $dt ($dl->look_down("_tag", "dt")) { my $key = lc($dt->as_text); $data{$key}{'dt'} = $dt->as_HTML; my $dd = $dt->right; $data{$key}{'dd'} = $dd->as_HTML; } # create a string my $output; foreach (sort {lc($a) cmp lc($b)} keys %data) { $output .= $data{$_}{'dt'} . $data{$_}{'dd'} . "<p></p>"; } # feed the string to a new Parser Object my $new_dl = HTML::TreeBuilder->new; $new_dl->parse($output); my $nu_aber = (); # remove unneccesary tags $nu_aber = $new_dl->guts(); # replace old dl with new dl $dl->delete_content(); $dl->push_content($nu_aber); my $hpp = new HTML::PrettyPrinter ( 'linelength' => 130, 'quote_attr' => 1, 'allow_forced_nl' => 1, 'entities' => "&<>äöüßÄÖÜ"); $hpp->set_force_nl(1,qw(body head table tr td)); $hpp->nl_before(2,qw(tr td p)); my $linearray_ref = $hpp->format($tree); print @{$linearray_ref}; $tree = $tree->destroy;
My main problem is to properly dereference the tree and to replace the DL part of the tree with a sorted array of HTML::Element Objects without having to create and parse code first.
Any hints greatly appreciated,
svenXY

In reply to HTML::TreeBuilder: sort a Definition List (<dl>) by svenXY

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.