firepro20 has asked for the wisdom of the Perl Monks concerning the following question:

I have the following entry in my logfile.

2016-04-17 10:12:27:682011 GMT tcp 115.239.248.245:1751 -> 192.168.0.17:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83d7d606a617acc5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133c9503e5086a5f2ac92be196bb0c9a9f653f9669495 (312 bytes)

I want to write a script to split this one line string into pieces in order to write some of these pieces in a .csv file for machine learning. Till now I got this script to find a certain pattern and if found write what it was given to find, hardcoded search. This is not what I want. This is the script I have right now.

#!/usr/bin/perl -w $path1 = "/home/tsec/testwatch/attackerresult.log"; $attacker = ">>/home/tsec/testwatch/attacker.csv"; #$path2 = #$path3 = #$path4 = #function definition #Pattern for attackerlog only sub extractor(){ open(LOG, $path1) or die "Cant't open '$path1': $!"; open(FILE, $attacker) or die "Can't open '$attacker': $!"; $target = "tcp"; while(<LOG>){ if(/$target/){ print FILE $target . "\n"; } } } close(LOG); close(FILE);

I want the output in the CSV file to be something like this:

I can do the csv titles manually

(Titles)Protocol, Source IP Address, Source Port, File Size

(String result from script)tcp, 127.0.0.1, 8080, 312

The above is just an example. Any idea?

Replies are listed 'Best First'.
Re: Split string variable of log input and output pieces in text file
by stevieb (Canon) on Apr 17, 2016 at 14:50 UTC

    If all lines will always have the same number of fields, this will work.

    use warnings; use strict; open my $wfh, '>', 'out.csv' or die $!; my $cols = "Protocol, Source IP Address, Source Port, Data Size\n"; print $wfh $cols; while (<DATA>){ if (/ (?:.*?\s){3} # get rid of the time (.*?) # capture the proto ($1) \s+ # skip the next whitespace (.*?):(\d+) # separate IP and port, capture both ($2, $3) .*?\( # skip everything until an opening parens (\d+) # capture bytes ($4) /x ){ print $wfh "$1, $2, $3, $4\n"; } } __DATA__ 2016-04-17 10:12:27:682011 GMT tcp 115.239.248.245:1751 -> 192.168.0.1 +7:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83d7d606a617ac +c5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133c9503e5086a5 +f2ac92be196bb0c9a9f653f9669495 (312 bytes)

    output file:

    # cat out.csv Protocol, Source IP Address, Source Port, Data Size tcp, 115.239.248.245, 1751, 312

    Update: original regex before expanding and explanation, and before not capturing the word 'bytes'

    /(?:.*?\s){3}(.*?)\s+(.*?):(\d+).*?\((.*?)\)$/
Re: Split string variable of log input and output pieces in text file
by Marshall (Canon) on Apr 17, 2016 at 16:59 UTC
    Sometimes for these fixed formats (and that doesn't mean "all the time"), it is easier to use split and then an array slice instead of a regex. You only have to select what you need. The first arg to split is a regex, but a simple one. Here is a demo of that. Notice the array index of -2. That is completely fine in Perl and means 2nd from the end. This isn't a good example of this, but I put the vars in the left side of the split into the order that I need them later and adjust in the indices in the array slice.
    #!usr/bin/perl use warnings; use strict; my $line = '2016-04-17 10:12:27:682011 GMT tcp 115.239.248.245:1751 -> + 192.168.0.17:8080 52976f9f34d5c286ecf70cac6fba4506 04159c6111bca4f83 +d7d606a617acc5d6a58328d3a631adf3795f66a5d6265f4d1ec99977a5ae8cb2f3133 +c9503e5086a5f2ac92be196bb0c9a9f653f9669495 (312 bytes)'; my ($protocol, $ip, $port, $size)= (split /[\s:()]+/,$line)[6,7,8,-2]; print join ",",($protocol, $ip, $port, $size); print "\n"; __END__ Prints: tcp,115.239.248.245,1751,312 To test easily to find the index numbers: my @x = split /[\s:()]+/,$line; print join "\n", @x; use the line number from text editor to see indicies without counting

      Thankyou very much for this. If you can explain how the regex works with // pattern matching that would be great. Especially since I have other log files with different formats.

        Ok, split /[\s:()]+/,$line>, please read http://perldoc.perl.org/functions/split.html.

        Split takes a line as input and makes an array according to the split regex. The split regex defines what constitutes a new array element boundary. During the split process the "separators" are "consumed", meaning deleted.

        The regex above says: "if I see one or more, spaces, colons or left paren or right paren", delete those and move what is left over to the left as an array element. This part: [6,7,8,-2] says ok, I've lots of stuff but I only want the 7th,8th,9th thing and the 2nd one from the end. Perl arrays are indexed at zero. So the first one is index[0]. Run my "hint" code and see what happens if you delete () from the regex. Experimentation is key. Run some examples and report back.

        This is not a perfect analogy, but if you had an old style typewriter and hit "carriage return" every time you saw the matching regex, you would wind up with my "hint" code.

Re: Split string variable of log input and output pieces in text file
by stevieb (Canon) on Apr 17, 2016 at 15:19 UTC

    I just noticed you cross-posted this over at Stack Overflow.

    It is considered polite to advise when you x-post, so that duplicate wasted efforts aren't done by people who don't visit both sites.

      You are right stevieb. Apologies for not advising. I was desperate to be honest. I am glad that some like you took the time to answer back and I thank all of you