in reply to Re: Operator for "these expressions, in any order"
in thread Operator for "these expressions, in any order"

I thought I would get one of these, I should have answered all these in my initial posting... just it is long to explain and I'm too lazy.

> Why are you matching HTML with regexes?

Because I'm only interested in very small parts of the file, and by hard experience I've learned this is the fastest way.

Background information:

Well, I have to admit that my question was a Perl Regexp question, but not a Perl question. The problem at hand is coding for Apache Jakarta JMeter, a Java load/performance testing application. There are several situations in which we need to analyze HTML, but it's never the whole thing, but just small bits of it. For example obtaining values in a particular hidden field to pass in a later request, or obtaining the URLs of embedded elements (images, CSSs, etc.) to download them too.

Hope this answers your question on why we need to examine the output and not the generator.

As for not doing the wrong thing upon oocurence of double-attributes, I'm not too worried about that -- I currently live happily with the (?:X|Y|Z){3} -- it's more of a "how would I do it?" question than a "how do I do it"?

Still, JMeter is a test tool, and it should help you detect problems in your code (most relevantly performance problems and problems that only happen under load). Code review, as you suggest, is another way -- a complementary one, not an alternative one.

> Why not use something like, oh, HTML::Parser ... ?

We have implemented three alternative solutions: one based on HtmlParser, one on JTidy, and a crappy one I wrote using regexps. I am aware that the later can never be formally correct, but it is currently the fastest of the three and, I'm proud to say, the most reliable in real-world situations so far.

Hope I've addressed all your relevant concerns.

Salut,

Jordi.

  • Comment on Re: Re: Operator for "these expressions, in any order"