Uruk has asked for the wisdom of the Perl Monks concerning the following question:

I've been searching CPAN and looking around for a way to parse data in microsoft help files (.HLP files used for online help in many applications) and I'm not having any luck at all. Is there a way at a minimum to extract the text from these files, and better yet model their structure as a data structure? At first I thought that they might just be bastardized .DOC files, or possibly even some twisted form of RTF, but near as I can tell neither of these is the case. Does anybody have any pointers or tips on programmatically dealing with these things?

Replies are listed 'Best First'.
Re: Microsoft help files (.HLP)
by tachyon (Chancellor) on Nov 04, 2002 at 21:04 UTC

    MS .hlp files come in V 1.x (RTF based) which are being replaced by V 2.x (HTLM based). These files are then compiled. Have a look at this decompiler The rest of the site has lots of useful info.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Microsoft help files (.HLP)
by bart (Canon) on Nov 04, 2002 at 21:26 UTC
    One thing I do recall from .hlp files, is that the text is compressed. Long ago, in a land far away... there was an article in DDJ about it.

    Got it... the date is Sept/Oct 1993, 9 years ago, the article itself, by Pete Davis, is in two parts in the series "Undocumented Corner". It's the first two results you get when you do a full text search for "Windows HLP file format". Available for purchose only. Sorry. And sorry, too, I can't give you a more direct link, some people really don't seem to grasp the basics of a hyperlinked medium. And those are the people who are supposed to know better.

    You can always try your luck at Wotsit.