Editing Help Files...

Is a task I have to do from time to time. Lately however, I've not had access to the source for the Windows .chm file I'm looking at. Topic checking is tedious enough as is, without the extra frustration of having to find the new topics at hand. After some small insight, it occurred to me that if you've got a copy of HTML Help Studio, it's pretty painless to de-compile the supplied file and among other goodies pluck out the associated .hhc file. This is almost a standard .html file, but not quite. For reasons that escape me Microsoft has fixed on a few new tags---meaningful to them, but not to their browser (don't ask why, this is Microsoft!) Still it is parseable and when combined with HTML::TokeParser::Simple, it can be re-formatted into a fully browser friendly outline, sort of a site map if you will. At any rate, here is the code:
#!/perl/bin/perl # # SiteMap.pl -- create a .html sitemap from a .chm .hhc file. use strict; use warnings; use diagnostics; use HTML::TokeParser::Simple; my $p = HTML::TokeParser::Simple->new($ARGV[0]); my $indent = 0; print "<!doctype html public \"-//W3C//DTD HTML 4.0 Transitional//EN\" +>\n"; print "<html>\n"; print "<head>\n"; print "<title>$ARGV[0]</title>\n"; print "</head>\n"; print "<body>\n"; print "<h2>Sitemap for $ARGV[0]</h2>\n"; while (my $token = $p->get_token) { if ($token->is_start_tag('ul')) { myprint($indent,"<ul>\n"); $indent++; } elsif ($token->is_start_tag('li')) { myprint($indent,"<li>"); } elsif ($token->is_end_tag('ul')) { $indent--; myprint($indent,"</ul>\n"); } elsif ($token->is_start_tag('param')) { my $ref = $token->return_attr(); if ($$ref{'name'} eq 'Name') { myprint(0,$$ref{'value'} . "</li>\n"); } } } print "</body>\n"; print "</html>\n"; sub myprint { my $l = shift; my $s = shift; my $pad = ' ' x $l; print "$pad$s"; }
Update: Exchange <pre>s for <code>s...

--hsm

"Never try to teach a pig to sing...it wastes your time and it annoys the pig."