in reply to Matching an IP address

Is there a better way I can do the (\d+\.\d+\.\d+\.\d+)?

What you have is fine, though you could take more advantage of /x to clean up the regex:

if ( $line =~ m{ ^\s* ( # begin client IP \d+\.\d+\.\d+\.\d+ ) :\d+ # client port (ignored) \s*->\s* ( # begin vips \d+\.\d+\.\d+\.\d+ ) :\d+ # vips port (ignored) \s*->\s* ( # begin frontend \d+\.\d+\.\d+\.\d+ ) }x ) { $client_ip{$1}{$iteration}++; $vips{$2}{$iteration}++; $frontend{$3}{$iteration}++; }
If you know the number of spaces around "->", use it instead of \s* (e.g., "\s->\s" instead of "\s*->\s*").

If you've got a lot of data, you're probably not going to want to pull it all into @connections. That's gotta suck up RAM.

Also, consider inverting the data structure you're collecting the counts in. If $iteration is relatively fixed (i.e., changes slowly, compared to the number of connections you're processing), you might save significant time by taking counts without considering $iteration, then sweep those counts into a larger data structure whenever $iteration changes. This is one to benchmark, since it could easily backfire depending on your data mix.

Replies are listed 'Best First'.
Re: Re: Regex redux
by ibanix (Hermit) on Nov 19, 2002 at 20:55 UTC
    The data from @connections is thankfully small. It's the endless upper bounds of $iteration that should suck up RAM in the long run. I haven't figured how I should deal with that yet.

    I've posted the full script (and questions) at http://www.perlmonks.org/index.pl?node_id=214252

    Thanks!

    <-> In general, we find that those who disparage a given operating system, language, or philosophy have never had to use it in pratice. <->