Re: HTML pages indexer

You said you're looking into the Storable module. Is this to make your Hash of Hashes persistent?

If so, you might want to check out MLDBM. Using MLDBM, you can tie your hash (of hashes) to a file on disk, and it will Automagically become persistant. One caveat is, with most underlying databases that MLDBM uses, there is an upper limit to the size of each (top-level) hash key; so if your sub-hashes are Very Large, you might run into some mysterious failures.

As for your code above... isn't it pretty slow? It looks like you're reading in a whole file at a time into a scalar, and then running a bunch of regexes on it. You may find it more time- and space- efficient to use the HTML::Parser module to do things like take out the tags, give you the contents of particular tags, and so on. The Perl Journal has an article on HTML::Parser in a recent issue. HTML::Parser will allow you more flexibility in the future when your requirements change as well.

Update: Oh yeah, one more thing... if you want to extend your indexerto recursively analyze subdirectories and their file contents as well, you should check out the File::Find module instead of limiting yourself to opendir/readdir.

Alan

Comment on Re: HTML pages indexer

Replies are listed 'Best First'.
RE: Re: HTML pages indexer by larsen (Parson) on Aug 07, 2000 at 19:06 UTC
Thank you very much. Larsen	[reply]