The major pain with trying to select records using regexes is that you have to try and match the whole record instead of just the fields that you are selecting on, hence your difficulties with specifiying the logical select "anything except this". The second problem is that of having your regex match against data in another part of the record than the field that you are interested in.
By imposing some structure on your data--ie. making the fields in the record fixed length--and matching or rejecting on a field-by-field basis rather than trying to match (or not) a whole record at a time, you greatly simplify the process. This is what you would get by moving your data into a flat file DB and using DBI to perform your queries.
At the very least, you should consider fixing the length of the fields of your records. You could then use substr as an lvalue in conjunction with a regex to greatly simplify the process of your queries. Eg.
if (substr($record, 0, 10) =~ $src_ip_of_interest
and substr($record, 10, 10) =~ $dst_ip_of_interest
and substr($record, 20, 4) =~ $proto_of_interest
and substr($record, 24, 6) !~ $src_port_of_disinterest
# etc ...
) {
#we found a record that matches the query
}
I think that you can see how much this simplifies the regexes involved. Generating conditionals using this form and using eval to execute them would be much simpler than trying to come up with a generic regex generator.
That said, using BerkleyDB or similar in conjunction with DBI::* would be considerably easier to code and probably much quicker in performance.
Examine what is said, not who speaks.
1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
3) Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke.
|