Thanks for all the links! Yes, I already looked a bit at HTML::TreeBuilder but as I understand zilch of it now I wanted to be sure that this is the right tool. The fact that the trees look powerful but with a huge overhead made me ask whether a regex would be faster. The web pages look consistent and a regex to give the number after the second occurence of "drill width:" should do fine.
Using a combination of grep|sed|tr|, each looped over each of the variables and all the files, my crapcode takes about 28hrs right now so everything that makes it faster is welcome.
In reply to Re^2: how to quickly parse 50000 html documents?
by brengo
in thread how to quickly parse 50000 html documents?
by brengo
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |