in reply to Html list sorting problem

Sometimes I just can't help myself. Here is a little bit of code to get you started. It will get all the text of all the <li> items into an array and then sort that array alphabetically.

#!/usr/bin/perl -w use strict; use HTML::TokeParser; my @list; my $file = "c:/test.htm"; my $p = HTML::TokeParser->new($file) || die "Can't open $file: $!"; while ( my $token = $p->get_tag( qw(li) ) ) { my $text = $p->get_trimmed_text(); push @list, $text; } @list = sort @list; print "Item $_: $list[$_]\n" for 0.. $#list;

cheers

tachyon

s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Replies are listed 'Best First'.
Re: Re: Html list sorting problem
by shahbazq (Initiate) on Sep 30, 2001 at 00:42 UTC
    Actually, i think i figured out what was messing it up (just so you don't feel stumped as to why you're code wasn't working :Þ), my lists of the "missing" section were actually links to internal anchors in the html file.

    so long as i'm already typing, i also wanted to ask not how to put all the LI tags into an array, but to be able to do the nested lists in seperate arrays in order to sort them independently of the other LI in an UL above or below it (i've been able to capture all the LI in one array by pattern matching earlier).

    Thanks, shahbazq

      Sorry no explanation, but it should be easy to follow with a little work.:

      #!/usr/bin/perl -w use strict; use HTML::TokeParser; my @list; my $level = -1; my $file = "c:/test.htm"; my $p = HTML::TokeParser->new($file) || die "Can't open $file: $!"; LOOP: while (my $token = $p->get_token ) { my $se = (@$token)[0]; # an opening tag will eq 'S' a closing tag + 'E' my $tag = (@$token)[1]; next LOOP unless $tag eq 'ul' or $tag eq 'li'; if ( $tag eq 'ul' ) { # this will be either a <ul> or a </ul> if ( $se eq 'S' ) { $level++; # increase level in response to <ul> } else { $level--; # decrease level in response to </ul> } next LOOP; } my $text = $p->get_trimmed_text(); push @{$list[$level]}, $text; } # data is now in a 2D data structure. you will need to read # up on these to understand the syntax # @{$list[0]} is level 1 # @{$list[1]} contains level 2 for my $i (0.. $#list) { my @array = @{$list[$i]}; @array = sort @array; print "Level $i\n"; print " $_\n" for @array; }

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

        Hey tachyon,

        I really want to say thanks for the help you've given me, and the last example (though not exactly what i needed) lead me to the right direction. I am really grateful for what you've done and how patient you've been with a newbie,
        Thanks a lot,
        shahbazq

Re: Re: Html list sorting problem
by shahbazq (Initiate) on Sep 30, 2001 at 00:20 UTC
    hey, thanks a lot for both responses, i tried messing around with tokeparser and i wasn't seeming to figure that out.

    However, I've been encountering a problem with the starting code you sent in the second message (btw i am really grateful that you got me started, even though its been less than an hour of snooping around, i've been more perl stuff since i know what to focus on) only gets the LI tags after the !-- Index End --.

    I am still going to snoop around, but if you have any previous experience, what do you think could be making it not see the first large list i have?

    Thanks again, shahbazq