in reply to HTML pages indexer
If so, you might want to check out MLDBM. Using MLDBM, you can tie your hash (of hashes) to a file on disk, and it will Automagically become persistant. One caveat is, with most underlying databases that MLDBM uses, there is an upper limit to the size of each (top-level) hash key; so if your sub-hashes are Very Large, you might run into some mysterious failures.
As for your code above... isn't it pretty slow? It looks like you're reading in a whole file at a time into a scalar, and then running a bunch of regexes on it. You may find it more time- and space- efficient to use the HTML::Parser module to do things like take out the tags, give you the contents of particular tags, and so on. The Perl Journal has an article on HTML::Parser in a recent issue. HTML::Parser will allow you more flexibility in the future when your requirements change as well.
Update: Oh yeah, one more thing... if you want to extend your indexerto recursively analyze subdirectories and their file contents as well, you should check out the File::Find module instead of limiting yourself to opendir/readdir.
Alan
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
RE: Re: HTML pages indexer
by larsen (Parson) on Aug 07, 2000 at 19:06 UTC |