Mashed Potato has asked for the wisdom of the Perl Monks concerning the following question:

Hello Sirs - please take it easy on me if this is an elementary question, I'm fairly new to Perl. I could not find any relevant topics.

I am trying to define a string from a subset of each line which matches my regex, and then read down in the file to find the other lines which match the string. If there is more than one match, print the number of matches and the lines.

My data:

src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=AGE OUT src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=Traffic Den +ied src=3.3.3.3 dst=4.4.4.4 src_port=50235 dst_port=123 reason=AGE OUT src=3.3.3.3 dst=4.4.4.4 src_port=50235 dst_port=123 reason=AGE OUT

My attempt seems to only print the first match. Is that because the inside file loop passes EOF to the outside? I believe I have used this before with arrays and it worked, but this file is too large to take into memory.

#!/usr/bin/perl use strict; use warnings; my $file = 'tmpfile'; my $match; my $numelements; open my $info, $file or die "Could not open $file: $!"; while( my $line = <$info>) { if ( $line =~ m/(src\=\d+\.\d+\.\d+\.\d+\sdst\=\d+\.\d+\.\d+\. +\d+\ssrc\_port\=\d+\sdst\_port\=\d+).*AGE OUT/ ) { $match = "$1"; push (my @dups, $line); while( my $twoline = <$info> ) { if ( $twoline =~ /$match/ ) { push @dups, $twoline; } } $numelements = @dups; print "$match has $numelements elements\n"; if ( $numelements > 1 ) { print join("\n", @dups); } } } close $info;

Output I get:

$./tcprst.pl src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 has 2 elements src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=AGE OUT src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=Traffic Den +ied $

Output I expect:

src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 has 2 elements src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=AGE OUT src=2.2.2.2 dst=1.1.1.1 src_port=50232 dst_port=514 reason=Traffic Den +ied src=3.3.3.3 dst=4.4.4.4 src_port=50235 dst_port=123 has 2 elements src=3.3.3.3 dst=4.4.4.4 src_port=50235 dst_port=123 reason=AGE OUT src=3.3.3.3 dst=4.4.4.4 src_port=50235 dst_port=123 reason=AGE OUT

Thank you in advance

Replies are listed 'Best First'.
Re: Define string on current line, then match other lines with string below the line
by GotToBTru (Prior) on Oct 02, 2014 at 18:27 UTC

    Your interior while loop runs to the end of the file. You need to seek back to the point you found your last match (assuming they always come in groups) to find the next string to search for. It looks like you could use the number of elements in the first line to know how many lines to search.

    1 Peter 4:10
Re: Define string on current line, then match other lines with string below the line
by QM (Parson) on Oct 03, 2014 at 10:36 UTC
    seek and tell may be useful. There's an example in seek that is almost what you need. I would try something like this (untested):
    while (my $line = <$info>) { my $next_line_start = tell($info); # save start position of next l +ine if ($line =~ /regex/) { do_something_here; } # reposition at $next_line_start seek($info,$next_line_start,0); }

    Update: removed defined in the while test, as it is included (here, at least) in while magic.

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

Re: Define string on current line, then match other lines with string below the line
by mr_mischief (Monsignor) on Oct 03, 2014 at 17:14 UTC

    Maybe I'm being dense, but why are you reading sections of the file more than once anyway? You only need to read from where you are and below as I read your node. This is usually done by setting some sort of flag value and testing that flag. Your I/O system will thank you.

    The following code produces very close to your expected output from your sample input. There may be an extraneous newline at the end if you care about that. I've heavily commented this to make it easier to follow. I also threw in some quite simple debugging for the data structure and made the regex a bit easier (for me) to read.

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; # added for debugging the hash my $DEBUG = 0; # enable debugging if true my $file = 'tmpfile'; my $numelements; my %connection; # This becomes the central data structure. Called it + 'connection' because it represents a typical TCP connection open my $info, '<', $file or die "Could not open $file: $!"; # favor t +hree-argument open when you're not using open's magic # Read all the info in a single pass by use of a start flag, put it in + the data structure. while ( my $line = <$info> ) { if ( $line =~ m/(src=(?:\d+\.){3}\d+ dst=(?:\d+\.){3}\d+ src_port= +\d+ dst_port=\d+) reason=(.*)/ ) { # capture the reason my $match = $1; if ( $2 eq 'AGE OUT' ) { # test to see if the reason is what w +e're looking for at the start $connection{ $match }{ 'aged_out' } = 1; } if ( exists $connection{ $match } and $connection{ $match }{ ' +aged_out' } ) { # If we've recieved an 'AGE OUT' reason, then $connection{ + $match }{ 'aged_out' } has been autovivified and we can start counti +ng and pushing. # Keep track of this and following lines for this connecti +on in this sub-hash. $connection{ $match }{ 'count' }++; push @{ $connection{ $match }{ 'line' } }, $line; # The ac +tual lines are in a HoHoA here. } } } close $info; print Dumper %connection if $DEBUG; # Now there's a data structure from the above loop we can loop over wi +thout accessing the file any longer. for my $con ( keys %connection ) { print "$con has " . $connection{ $con }{ 'count' } . " elements\n" +; if ( $connection{ $con }{ 'count' } > 1 ) { print @{ $connection{ $con }{ 'line' } }; # Doesn't need to be + joined because the newlines were never stripped. } print "\n"; }

      First: thank you.

      Second: HoHoA? That's one Ho short of a Santa-A (Canadian?). Seriously though, I have yet to venture into hash-land, nevermind hashes of arrays, and certainly not Santas who are not playing with a full deck of Ho's.

      Since it's obviously time for me to get into hashes, do you have any examples like this one where data from a file is pushed to the hash, as opposed to the user defining it? Unfortunately nobody at my work cares that I can create a hash with different fruits and vegetables from my mind.

      And third, thanks for saying TCP because it made me realize I don't need to look for UDP connections. I will work that into the regex.

      Great learning experience, thanks again to everyone who replied.

        Yeah, I was a bit concerned that a hash of hashes of arrays was a bit complex in this case. Sometimes I find it easier to think about the levels backward. There's an array of the lines kept in 'line', and a reference to each 'line' is kept in its own $match hash. A reference to each $match is kept in %connection to hold it all together. The 'count' is just another branch of that tree. Set $DEBUG to 1 and look at the data structure.

        I've found some quotes about data structures I'd like to share before I start giving bibliography.

        • "Bad programmers worry about the code. Good programmers worry about data structures and their relationships." -- Linus Torvalds
        • Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won't usually need your flowcharts; they'll be obvious. -- Fred Brooks.
        • Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. -- Fred Brooks
        • "It is better to have 100 functions operate on one data structure than to have 10 functions operate on 10 data structures." —Alan J. Perlis
        If you don't know who those people are or why I've chosen them to quote, then I suggest a bit of research on them. Their writing will make you a better programmer. As will stuff by Rob Pike, Al Aho, and many others for that matter.

        Besides the wonderful Modern Perl already mentioned in the thread, there are other resources, too.