in reply to Define string on current line, then match other lines with string below the line

Maybe I'm being dense, but why are you reading sections of the file more than once anyway? You only need to read from where you are and below as I read your node. This is usually done by setting some sort of flag value and testing that flag. Your I/O system will thank you.

The following code produces very close to your expected output from your sample input. There may be an extraneous newline at the end if you care about that. I've heavily commented this to make it easier to follow. I also threw in some quite simple debugging for the data structure and made the regex a bit easier (for me) to read.

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; # added for debugging the hash my $DEBUG = 0; # enable debugging if true my $file = 'tmpfile'; my $numelements; my %connection; # This becomes the central data structure. Called it + 'connection' because it represents a typical TCP connection open my $info, '<', $file or die "Could not open $file: $!"; # favor t +hree-argument open when you're not using open's magic # Read all the info in a single pass by use of a start flag, put it in + the data structure. while ( my $line = <$info> ) { if ( $line =~ m/(src=(?:\d+\.){3}\d+ dst=(?:\d+\.){3}\d+ src_port= +\d+ dst_port=\d+) reason=(.*)/ ) { # capture the reason my $match = $1; if ( $2 eq 'AGE OUT' ) { # test to see if the reason is what w +e're looking for at the start $connection{ $match }{ 'aged_out' } = 1; } if ( exists $connection{ $match } and $connection{ $match }{ ' +aged_out' } ) { # If we've recieved an 'AGE OUT' reason, then $connection{ + $match }{ 'aged_out' } has been autovivified and we can start counti +ng and pushing. # Keep track of this and following lines for this connecti +on in this sub-hash. $connection{ $match }{ 'count' }++; push @{ $connection{ $match }{ 'line' } }, $line; # The ac +tual lines are in a HoHoA here. } } } close $info; print Dumper %connection if $DEBUG; # Now there's a data structure from the above loop we can loop over wi +thout accessing the file any longer. for my $con ( keys %connection ) { print "$con has " . $connection{ $con }{ 'count' } . " elements\n" +; if ( $connection{ $con }{ 'count' } > 1 ) { print @{ $connection{ $con }{ 'line' } }; # Doesn't need to be + joined because the newlines were never stripped. } print "\n"; }
  • Comment on Re: Define string on current line, then match other lines with string below the line
  • Download Code

Replies are listed 'Best First'.
Re^2: Define string on current line, then match other lines with string below the line
by Mashed Potato (Initiate) on Oct 05, 2014 at 14:14 UTC

    First: thank you.

    Second: HoHoA? That's one Ho short of a Santa-A (Canadian?). Seriously though, I have yet to venture into hash-land, nevermind hashes of arrays, and certainly not Santas who are not playing with a full deck of Ho's.

    Since it's obviously time for me to get into hashes, do you have any examples like this one where data from a file is pushed to the hash, as opposed to the user defining it? Unfortunately nobody at my work cares that I can create a hash with different fruits and vegetables from my mind.

    And third, thanks for saying TCP because it made me realize I don't need to look for UDP connections. I will work that into the regex.

    Great learning experience, thanks again to everyone who replied.

      Yeah, I was a bit concerned that a hash of hashes of arrays was a bit complex in this case. Sometimes I find it easier to think about the levels backward. There's an array of the lines kept in 'line', and a reference to each 'line' is kept in its own $match hash. A reference to each $match is kept in %connection to hold it all together. The 'count' is just another branch of that tree. Set $DEBUG to 1 and look at the data structure.

      I've found some quotes about data structures I'd like to share before I start giving bibliography.

      • "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
      • Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious. -- Fred Brooks.
      • Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. -- Fred Brooks
      • "It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." —Alan J. Perlis
      If you don't know who those people are or why I've chosen them to quote, then I suggest a bit of research on them. Their writing will make you a better programmer. As will stuff by Rob Pike, Al Aho, and many others for that matter.

      Besides the wonderful Modern Perl already mentioned in the thread, there are other resources, too.