in reply to Runtime Regexp Generation

I'm not known for being one to say "don't use regexes for that" or "Use a database", but in this instance, what you are trying to devise is a Query Language.

You may be able to acheive your aims with regexes, but moving your data into a DB and using SQL will undoubtedly save you considerable time and effort.


Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.

Replies are listed 'Best First'.
I agree, but...
by tekkie (Beadle) on Apr 14, 2003 at 16:20 UTC
    the data is generated on the fly, it's a packet capture of traffic on a network segment at any given moment. The data isn't already there... I'm collecting, crunching, and producing output from the data all in one go.

      The major pain with trying to select records using regexes is that you have to try and match the whole record instead of just the fields that you are selecting on, hence your difficulties with specifiying the logical select "anything except this". The second problem is that of having your regex match against data in another part of the record than the field that you are interested in.

      By imposing some structure on your data--ie. making the fields in the record fixed length--and matching or rejecting on a field-by-field basis rather than trying to match (or not) a whole record at a time, you greatly simplify the process. This is what you would get by moving your data into a flat file DB and using DBI to perform your queries.

      At the very least, you should consider fixing the length of the fields of your records. You could then use substr as an lvalue in conjunction with a regex to greatly simplify the process of your queries. Eg.

      if (substr($record, 0, 10) =~ $src_ip_of_interest and substr($record, 10, 10) =~ $dst_ip_of_interest and substr($record, 20, 4) =~ $proto_of_interest and substr($record, 24, 6) !~ $src_port_of_disinterest # etc ... ) { #we found a record that matches the query }

      I think that you can see how much this simplifies the regexes involved. Generating conditionals using this form and using eval to execute them would be much simpler than trying to come up with a generic regex generator.

      That said, using BerkleyDB or similar in conjunction with DBI::* would be considerably easier to code and probably much quicker in performance.


      Examine what is said, not who speaks.
      1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
      2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
      3) Any sufficiently advanced technology is indistinguishable from magic.
      Arthur C. Clarke.

        It appears to me that there is no need to make the fields fixed-length since the fields appear to never contain whitespace and always to be separated by whitespace so it is not that hard to build a regex that matches exactly as desired.

                        - tye