Addendum FYI: The history from the previous owner of the chatroom is 48 months, with the average size of each months HTML file being about 80kb, with largest about 220kb.
So at present, I have 60 files to scan. And my users are patient, so a bit of delay to accomplish the search is ok.
It certainly beats the current situtation, which is browsing EACH months history one-by-one and doing search via "Find"! | [reply] |
From my quick test, wsfp's (very excellent) code using HTML::TokeParser::Simple takes around 7 seconds to run the same test as my rather crude regex solution takes 2/10ths.
As you are processing your own, controlled data, you can choose either with a fair degree of safety. If you were processing html from another source where you didn't control the layout, the parser route would be preferable.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |