in reply to Variable number of words/fields in a line/record

I think you should consider how you're modelling your data, as there are not a variable number of fields there (at least to my eyes). Let's look at just the part after "conduit permit", upon examining the data, we can break it down into five distinct pieces.
conduit permit tcp host 192.168.1.1 eq www any (hitcnt=57476) |1| | 2 | |3| |4| | 5 | conduit permit tcp host 192.168.1.1 eq 139 host 192.168.2.1 (hitcnt=2) |1| | 2 | |3| | 4 | | 5 |
So you don't have a variable number of fields, all cases can be represented as:
$protocol, $server, $port, $client, $hits

See the 'www' and 139 are no different; 'www' is just a label to port 80. As well, 'any' is just a special case of host aaa.bbb.ccc.ddd as it represents all the valid IPs (or host *).

You might want to consider representing the ip as an ip/mask (decimal mask) in the database so the special case of 'any' can be easily represented in a not null manner. This will also help if your firewall allows designation by named IP groups and ranges for rulesets. If no data in any given field will be null (NOT NULL speced in table creation) many more indexing and relation options become open. You can then easily create lookup tables so that 'www' maps to '80', or an IP is mapped to a named person (ie. an admin or employee), or a whole IP range is named given your firewall supports named groups as stated before (if you want more info on db normalisation, the various relationship types and constraints feel free to /msg me and I'll bore you to death about them).

Replies are listed 'Best First'.
Re: Re: Variable number of words/fields in a line/record
by Tuna (Friar) on Jun 16, 2001 at 15:00 UTC
    conduit permit tcp host 192.168.1.1 eq www any (hitcnt=57476) |1| |2 | | 3 | |4||5| |6| | 7 | conduit permit tcp host 192.168.1.1 eq 139 host 192.168.2.1 (hitcnt=2) |1| |2 | | 3 | |4||5| |6| | 7 | | 8 |

      I don't think you're seeing what I mean. "host" is just a filler word, it has absolutely no bearing on the real data. "any" carries with it an implicit host, you could just as easily say "host any" or "host all". The root of it's meaning is "which host do I allow?" (based on the "permit" earlier).

      Think of it in another way, let's say you had a theme park and for different rides there were different height requirments. We could express this as:

      lane permit waterslide "The Slayer" eq 5 person 280cm (pplcnt=30) lane permit waterslide "The Trickle" eq 2 anyone (pplcnt=532)

      So we permit the use of "The Slayer" waterslide if they're a person over 280cm (and a little insane). In the second we permit the use of "The Trickle" to anyone who wants to. What I'm getting at here is that what you're taking as two pieces of information is in fact only one piece. If you go back and think through what the firewall is actually telling you, you realise that it's just different grammar that makes one longer.

      So to go back to primary problem:

      conduit permit tcp host 192.168.1.1 eq www any (hitcnt=57476) | 0 | |1| | 2 | |3| |4| | 5 | conduit permit tcp host 192.168.1.1 eq 139 host 192.168.2.1 (hitcnt=2) | 0 | |1| | 2 | |3| | 4 | | 5
      I've revised the diagram to just highlight the important data, all the rest is just packing material.
      • $permit is a boolean value (bit), either you permit or deny.
      • $protocol can be represented many ways, if you only deal in TCP and UDP -- and are tight on space -- you can represent it as a bit; a nicer datatype might be char(3) or char(4), depending on what other protocols you use, as it maps better for human understanding.
      • $server is an IP or possibly IP range. If your database supports internal ip/masks as a datatype then use that; if you plan to be doing a lot of indexing and/or matching/lookups on it, use an integer representation; or if all you really do is display it back, a varchar() would work as well.
      • $port has a range of [0..2^16) which is an unsigned small int (or int(2) to many dbs).
      • $client is an IP/mask again, as dicussed in server. Here though the mask comes more into play, as you can represents groups of computer (eg. "any") quite easily in that notation.
      • $hitcnt would be some form of integer unless you have insane daily traffic :)

      If I come across as heavy handed, please don't take it as such. I just think you're setting yourself up for way too much work and less robust reporting than you could achieve with a good foundation (data structure).

        Go easy on him. He doesn't know that split can split on more than spaces. So he can't imagine how anyone could ignore the spaces in the data.