shahbazq has asked for the wisdom of the Perl Monks concerning the following question:

Howdy, i'm really new to perl, so i hope you guys could give me clues or where to go to figure this out. I have pod2html generated html file, and in it it has some of the following layout
<html> useless info <body> <!-- Index Begin --> <ul> <li> <li> <ul> <li> <li> </ul> </ul> <!-- Index End -->
my goal is to sort the levels of the <li>s and not messing with anything out of of the <!-- index --> stuff due to other <li>s in the document.

So far i've been able to just pass the begining with a while loop until it sees the Index begin and then store the all of the list info into an array (for now, i don't really know what to do with it) and then do a last if matching index end.

someone recommended me to look into html::parser, which i did, but was terribly confused with modules and the like. If html::parser or any other modules are necesary, could you please explain or link me to a newbie tutorial site?

Thanks, shahbazq

Replies are listed 'Best First'.
Re: Html list sorting problem
by tachyon (Chancellor) on Sep 29, 2001 at 23:22 UTC
Re: Html list sorting problem
by tachyon (Chancellor) on Sep 29, 2001 at 23:38 UTC

    Sometimes I just can't help myself. Here is a little bit of code to get you started. It will get all the text of all the <li> items into an array and then sort that array alphabetically.

    #!/usr/bin/perl -w use strict; use HTML::TokeParser; my @list; my $file = "c:/test.htm"; my $p = HTML::TokeParser->new($file) || die "Can't open $file: $!"; while ( my $token = $p->get_tag( qw(li) ) ) { my $text = $p->get_trimmed_text(); push @list, $text; } @list = sort @list; print "Item $_: $list[$_]\n" for 0.. $#list;

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Actually, i think i figured out what was messing it up (just so you don't feel stumped as to why you're code wasn't working :Þ), my lists of the "missing" section were actually links to internal anchors in the html file.

      so long as i'm already typing, i also wanted to ask not how to put all the LI tags into an array, but to be able to do the nested lists in seperate arrays in order to sort them independently of the other LI in an UL above or below it (i've been able to capture all the LI in one array by pattern matching earlier).

      Thanks, shahbazq

        Sorry no explanation, but it should be easy to follow with a little work.:

        #!/usr/bin/perl -w use strict; use HTML::TokeParser; my @list; my $level = -1; my $file = "c:/test.htm"; my $p = HTML::TokeParser->new($file) || die "Can't open $file: $!"; LOOP: while (my $token = $p->get_token ) { my $se = (@$token)[0]; # an opening tag will eq 'S' a closing tag + 'E' my $tag = (@$token)[1]; next LOOP unless $tag eq 'ul' or $tag eq 'li'; if ( $tag eq 'ul' ) { # this will be either a <ul> or a </ul> if ( $se eq 'S' ) { $level++; # increase level in response to <ul> } else { $level--; # decrease level in response to </ul> } next LOOP; } my $text = $p->get_trimmed_text(); push @{$list[$level]}, $text; } # data is now in a 2D data structure. you will need to read # up on these to understand the syntax # @{$list[0]} is level 1 # @{$list[1]} contains level 2 for my $i (0.. $#list) { my @array = @{$list[$i]}; @array = sort @array; print "Level $i\n"; print " $_\n" for @array; }

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      hey, thanks a lot for both responses, i tried messing around with tokeparser and i wasn't seeming to figure that out.

      However, I've been encountering a problem with the starting code you sent in the second message (btw i am really grateful that you got me started, even though its been less than an hour of snooping around, i've been more perl stuff since i know what to focus on) only gets the LI tags after the !-- Index End --.

      I am still going to snoop around, but if you have any previous experience, what do you think could be making it not see the first large list i have?

      Thanks again, shahbazq