PhiThors has asked for the wisdom of the Perl Monks concerning the following question:

Hi Everyone,

I am student in college and recently joined a research project. I am in the process of editing a script. The purpose of the original script is to take information from a email server log text file and copy+create another text file to post it in a more readable fashion:

 sender, receiver1, receiver2, etc....(even though i would like to put it: sender, receiver1 <br>sender, receiver2 <br>

We also have phone server log files and would like to do almost the same with the internal phone communication. The difference will be that we would like to only have one sender and receiver on each line.

The email log data looks like this(copy of the part that we focus on):
.................. 16:32:59 256 Distribute message from: olgal (olgal) 16:32:59 256 Begin distribution to 1 users 16:32:59 256 Distributed: TulaM 16:32:59 408 Notifying client at: 192.168.1.103 UDP port 65534 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:33:03 152 getQuickMessagesResponse is too large: [EA04] 16:33:03 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Processing update: item record (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Purge Execution Record #308860 (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 176 Distribute message from: ElizabethY (ElizabethY) 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:06 176 Begin distribution to 2 users 16:33:06 176 Distributed: KaseyT 16:33:06 176 Distributed: KirkL 16:33:06 256 Notifying client at: 10.0.200.30 UDP port 49857 16:33:06 256 Notifying client at: 10.0.200.83 UDP port 62849 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 56097 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 64665 16:33:11 176 Distribute message from: marcc (marcc) 16:33:11 176 Begin distribution to 1 users 16:33:11 176 Distributed: JasonE .................
Here is the current script:
#!/bin/perl use File::Path; # Constants my $from="Distribute message from:"; my $to="Distributed:"; if ( $#ARGV != 0 ) { usage(); exit 1; } # # Parse the command line argument # $dir = shift @ARGV; if ( ! -e $dir ) { print "'$dir' does not exists. Exiting the script.\n"; exit 1; } elsif (! -d $dir ) { print "'$dir' is not a valid directory. Exiting the script.\n"; exit 1; } my $outdir = "$dir/filtered"; my $logdir = "$dir/logdir"; if ( -e "$outdir" ) { if ( -f "$outdir" ) { print "'$outdir' is a file. Rename the file.\n"; exit 0; } else { rmtree("$outdir", 0) || die "Could not delete '$outdir' $!\n"; } } mkpath("$outdir") || die "Could not create '$outdir' $!\n"; if ( -e "$logdir" ) { if ( -f "$logdir" ) { print "'$logdir' is a file. Rename the file.\n"; exit 0; } else { rmtree("$logdir", 0) || die "Could not delete '$logdir' $!\n"; } } mkpath("$logdir") || die "Could not create '$logdir' $!\n"; opendir(DIR, "$dir") || die "Can't open directory '$dir' $!\n"; @files = readdir(DIR); closedir(DIR); foreach $file (@files) { print "Processing $dir/$file\n"; if ( -f "$dir/$file") { open(FIN, "<$dir/$file"); open(FOUT, ">$outdir/$file"); open(LOG, ">$logdir/$file"); my %map = (); my %lmap = (); while ($line = <FIN>) { chomp($line); doLog("Processing", $line); if ( $line =~ /^[0-9]/m) { my ($time, $data) = removetimestamp($line); my ($key, $value) = keyValue($data); if (defined $key) { if ( $value =~ /^$from/ ) { my $sender = getSenderName($value); if (defined $map{$key}) { $val = $map{$key}; doLog("End of sender", $val); my $length = $val; $val = substr($val, 0, ($length - 1)); print FOUT "$val\n"; $map{$key} = "$sender,"; $lmap{$key} = $line; doLog("Replacing", $sender); } else { doLog("New Entry", "$data"); $map{$key} = "$sender,"; $lmap{$key} = $line; } } elsif ( $value =~ /^$to/ ) { my $recipient = getRecipientName($value); if (defined $map{$key} ) { $val = $map{$key}; doLog("Adding recipeint:", "$recipient to $val"); $val .= "$recipient,"; $map{$key} = $val; } else { doLog("Ignoring", $line); } } else { doLog("Ignoring", $line); } } } else { doLog("Ignoring", $line); } } for $mkey ( keys %lmap ) { doLog("Incomplete", $lmap{$mkey}); } close FIN; close FOUT; close LOG; } print "Processed $dir/$file\n"; } sub removetimestamp() { my ($line) = @_; my $ind = index($line, " "); if ( $ind != "-1" ) { $time = substr($line, 0, $ind); $line = substr($line, $ind + 1); return ($time, $line); } } sub keyValue() { my ($line) = @_; my $ind = index($line, " "); if ($ind != -1) { my $key = substr($line, 0, $ind); my $value = substr($line, $ind + 1); return ($key, $value); } } sub getSenderName() { my ($from_line) = @_; my $ind = index($from_line, ":"); if ( $ind != "-1" ) { $sender = substr($from_line, $ind + 1); $sender =~ s/^\s+//; #remove leading spaces $sender =~ s/\s+$//; #remove trailing spaces return $sender; } } sub getRecipientName() { my ($to_line) = @_; my $ind = index($to_line, ":"); if ( $ind != "-1" ) { $recipient = substr($to_line, $ind + 1); $recipient =~ s/^\s+//; #remove leading spaces $recipient =~ s/\s+$//; #remove trailing spaces return $recipient; } } sub doLog() { my ($msg, $line) = @_; print LOG "$msg: $line\n"; } sub usage() { print "Usage:\n"; print " cmd: perl <path to>dmining.pl <directory-name>\n"; print " where:\n"; print " directory-name: the absolute or relative path to raw data +\n"; }
And here is the phone log format, where the first partyID is the caller and second partyID is the receiver:
............................ 08:17:12.245 ( 1528: 4112) [TMS] AddMediaStreamFromCDS : Media Stream +Event from Remote Server is logged hr=0x00000000 08:17:12.245 ( 1528: 4132) [TMS] CNccCdrLog::ProcessMediaStreamRequest +: 0 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID= +"3855", CtrlPartyID="" 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID= +"OutOfArea", CtrlPartyID="" 08:17:17.844 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOfL +ist: after 3 entries, hr=0xC1170A2E 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID= +"2863", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID= +"2717", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOfL +ist: after 3 entries, hr=0xC1170A2E 08:17:21.640 ( 1528: 4112) [TMS] NccDtcpMsgLegState : Processing Le +g State Event ...........................

Any help would be greatly appreciated. i might have taken on a little too much here it seems.

Thank you for any help!

Replies are listed 'Best First'.
Re: Redoing a script
by jwkrahn (Abbot) on Feb 15, 2012 at 08:26 UTC
    #!/bin/perl use File::Path;

    Your program should start with the warnings and strict pragmas:

    #!/bin/perl use warnings; use strict; use File::Path;


    sub removetimestamp() { my ($line) = @_; ... sub keyValue() { my ($line) = @_; ... sub getSenderName() { my ($from_line) = @_; ... sub getRecipientName() { my ($to_line) = @_; ... sub doLog() { my ($msg, $line) = @_;

    You use a prototype that says the subroutines will accept NO arguments but every subroutine DOES accept some arguments.    You should not use prototypes.

    The removetimestamp subroutine is exactly the same as the keyValue subroutine, and the getSenderName subroutine is exactly the same as the getRecipientName subroutine.    You shouldn't duplicate code like that.



    sub removetimestamp() { my ($line) = @_; my $ind = index($line, " "); if ( $ind != "-1" ) { $time = substr($line, 0, $ind); $line = substr($line, $ind + 1); return ($time, $line); } }

    That could be simplified to:

    sub removetimestamp { my ( $line ) = @_; ( my $time, $line ) = split / /, $line, 2; }

    But then you don't really need a subroutine so:

    my ($time, $data) = removetimestamp($line); my ($key, $value) = keyValue($data);

    Would become:

    my ($time, $data) = split / /, $line, 2; my ($key, $value) = split / /, $data, 2;
Re: Redoing a script
by GrandFather (Saint) on Feb 15, 2012 at 08:52 UTC

    There is a lot of fairly dodgy code in there. Nothing really terrible, but lots of sloppy coding like several subs where the only difference is in variable names, or the use of prototype subs that would cause the code to fail except that the subs are defined after their first use so the prototypes are ignored (it's not important that you understand what that means btw). To be fair there is some good code there too, but it would be hard for a newbie to pick what is good and what is bad. With a good tidy up I'd expect that code to reduce to about 2/3 of its current size and to become easier to understand too.

    If you have the option I'd suggest you start from scratch for the new project and come back to PerlMonks to ask questions when you get stuck. You aren't going to learn a lot that is good working from the code you have shown us. Unless someone provides a complete solution for you, it'll probably be quicker to dump this code and start over for the new task. If you do, please come back to have it critiqued so we can help you learn appropriate Perl coding habits.

    True laziness is hard work
      I think that you are being too kind to the original code. I figure that it is more than "fairly dodgy". The idea that the code could be reduced to only 2/3 of original size is a vast understatement!

      This is a case where the general implementation approach of the code is just so far wrong, that telling the OP to study more won't help. The OP is trying to modify very poorly written code from another student.

      "studying crap" is just going to result in "more crap".

      I think this is case where showing some Monk level code is appropriate and I tried to do that. The OP has gotten some enormous hints about what to do.

      I hope that the OP uses the code given and comes back with better questions.

Re: Redoing a script
by Marshall (Canon) on Feb 15, 2012 at 08:42 UTC
    The current script wasn't helpful to me for solving your second phone log parsing problem.

    Here is a simple parser for your new problem.

    #!/usr/bin/perl -w use strict; my $current_call=""; my $current_line=""; while (<DATA>) { my ($call, $partyID) = /Call="(\w+)".*partyID="(\w+)"/; next unless defined $partyID; if ( $call ne $current_call) # new call record { print "$current_line\n" if $current_line ne ""; $current_call = $call; $current_line = $partyID; } else # the receiver partyID { $current_line .= " $partyID"; } } print "$current_line\n"; =prints 3855 OutOfArea 2863 2717 =cut __DATA__ 08:17:12.245 ( 1528: 4112) [TMS] AddMediaStreamFromCDS : Media Stream + Event from Remote Server is logged hr=0x00000000 08:17:12.245 ( 1528: 4132) [TMS] CNccCdrLog::ProcessMediaStreamReques +t: 0 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valid +ating CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID +="3855", CtrlPartyID="" 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valid +ating CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID +="OutOfArea", CtrlPartyID="" 08:17:17.844 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOf +List: after 3 entries, hr=0xC1170A2E 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valid +ating CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID +="2863", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valid +ating CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID +="2717", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOf +List: after 3 entries, hr=0xC1170A2E 08:17:21.640 ( 1528: 4112) [TMS] NccDtcpMsgLegState : Processing L +eg State Event
    UPDATE:
    I looked again the the original script. That thing is so wordy that it is hard to figure out what it actually does! However, I think that I did? Anyway here is another parser built upon the same techniques as my first one.

    I think there is an obvious difference in clarity.

    #!/usr/bin/perl -w use strict; my $current_line=""; while (<DATA>) { if ( /Distribute message from:\s*(.*)\s*$/ ) { print "$current_line\n" if $current_line ne ""; $current_line = $1; } elsif ( /Distributed:\s*(\w+)\s*$/) { $current_line .= ",$1"; #add a receipient } } print "$current_line\n"; =prints olgal (olgal),TulaM ElizabethY (ElizabethY),KaseyT,KirkL marcc (marcc),JasonE =cut __DATA__ 16:32:59 256 Distribute message from: olgal (olgal) 16:32:59 256 Begin distribution to 1 users 16:32:59 256 Distributed: TulaM 16:32:59 408 Notifying client at: 192.168.1.103 UDP port 65534 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:33:03 152 getQuickMessagesResponse is too large: [EA04] 16:33:03 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Processing update: item record (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Purge Execution Record #308860 (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 176 Distribute message from: ElizabethY (ElizabethY) 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:06 176 Begin distribution to 2 users 16:33:06 176 Distributed: KaseyT 16:33:06 176 Distributed: KirkL 16:33:06 256 Notifying client at: 10.0.200.30 UDP port 49857 16:33:06 256 Notifying client at: 10.0.200.83 UDP port 62849 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 56097 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 64665 16:33:11 176 Distribute message from: marcc (marcc) 16:33:11 176 Begin distribution to 1 users 16:33:11 176 Distributed: JasonE
    Another update: After looking back yet again at the requirements, you want the printout to be:
    sender, receiver1
    sender, receiver2
    I'll leave that slight modification to you as an exercise (instead of adding a recipient to the line, just print a whole new line) - the above just implements what the original script did! Let us know how you make out with this.
Re: Redoing a script
by Corion (Patriarch) on Feb 15, 2012 at 07:58 UTC

    So, where does the script you posted have problems? What have you tried to address these problems?