Hi Everyone,

I am student in college and recently joined a research project. I am in the process of editing a script. The purpose of the original script is to take information from a email server log text file and copy+create another text file to post it in a more readable fashion:

 sender, receiver1, receiver2, etc....(even though i would like to put it: sender, receiver1 <br>sender, receiver2 <br>

We also have phone server log files and would like to do almost the same with the internal phone communication. The difference will be that we would like to only have one sender and receiver on each line.

The email log data looks like this(copy of the part that we focus on):
.................. 16:32:59 256 Distribute message from: olgal (olgal) 16:32:59 256 Begin distribution to 1 users 16:32:59 256 Distributed: TulaM 16:32:59 408 Notifying client at: 192.168.1.103 UDP port 65534 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:32:59 408 Notifying client at: 172.16.201.27 UDP port 1109 16:33:03 152 getQuickMessagesResponse is too large: [EA04] 16:33:03 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Processing update: item record (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 256 Purge Execution Record #308860 (christinam) 16:33:06 176 Notifying client at: 10.0.2.159 UDP port 61774 16:33:06 176 Distribute message from: ElizabethY (ElizabethY) 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:06 176 Begin distribution to 2 users 16:33:06 176 Distributed: KaseyT 16:33:06 176 Distributed: KirkL 16:33:06 256 Notifying client at: 10.0.200.30 UDP port 49857 16:33:06 256 Notifying client at: 10.0.200.83 UDP port 62849 16:33:06 256 Notifying client at: 10.0.200.61 UDP port 2257 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 56097 16:33:11 256 Notifying client at: 10.0.13.7 UDP port 64665 16:33:11 176 Distribute message from: marcc (marcc) 16:33:11 176 Begin distribution to 1 users 16:33:11 176 Distributed: JasonE .................
Here is the current script:
#!/bin/perl use File::Path; # Constants my $from="Distribute message from:"; my $to="Distributed:"; if ( $#ARGV != 0 ) { usage(); exit 1; } # # Parse the command line argument # $dir = shift @ARGV; if ( ! -e $dir ) { print "'$dir' does not exists. Exiting the script.\n"; exit 1; } elsif (! -d $dir ) { print "'$dir' is not a valid directory. Exiting the script.\n"; exit 1; } my $outdir = "$dir/filtered"; my $logdir = "$dir/logdir"; if ( -e "$outdir" ) { if ( -f "$outdir" ) { print "'$outdir' is a file. Rename the file.\n"; exit 0; } else { rmtree("$outdir", 0) || die "Could not delete '$outdir' $!\n"; } } mkpath("$outdir") || die "Could not create '$outdir' $!\n"; if ( -e "$logdir" ) { if ( -f "$logdir" ) { print "'$logdir' is a file. Rename the file.\n"; exit 0; } else { rmtree("$logdir", 0) || die "Could not delete '$logdir' $!\n"; } } mkpath("$logdir") || die "Could not create '$logdir' $!\n"; opendir(DIR, "$dir") || die "Can't open directory '$dir' $!\n"; @files = readdir(DIR); closedir(DIR); foreach $file (@files) { print "Processing $dir/$file\n"; if ( -f "$dir/$file") { open(FIN, "<$dir/$file"); open(FOUT, ">$outdir/$file"); open(LOG, ">$logdir/$file"); my %map = (); my %lmap = (); while ($line = <FIN>) { chomp($line); doLog("Processing", $line); if ( $line =~ /^[0-9]/m) { my ($time, $data) = removetimestamp($line); my ($key, $value) = keyValue($data); if (defined $key) { if ( $value =~ /^$from/ ) { my $sender = getSenderName($value); if (defined $map{$key}) { $val = $map{$key}; doLog("End of sender", $val); my $length = $val; $val = substr($val, 0, ($length - 1)); print FOUT "$val\n"; $map{$key} = "$sender,"; $lmap{$key} = $line; doLog("Replacing", $sender); } else { doLog("New Entry", "$data"); $map{$key} = "$sender,"; $lmap{$key} = $line; } } elsif ( $value =~ /^$to/ ) { my $recipient = getRecipientName($value); if (defined $map{$key} ) { $val = $map{$key}; doLog("Adding recipeint:", "$recipient to $val"); $val .= "$recipient,"; $map{$key} = $val; } else { doLog("Ignoring", $line); } } else { doLog("Ignoring", $line); } } } else { doLog("Ignoring", $line); } } for $mkey ( keys %lmap ) { doLog("Incomplete", $lmap{$mkey}); } close FIN; close FOUT; close LOG; } print "Processed $dir/$file\n"; } sub removetimestamp() { my ($line) = @_; my $ind = index($line, " "); if ( $ind != "-1" ) { $time = substr($line, 0, $ind); $line = substr($line, $ind + 1); return ($time, $line); } } sub keyValue() { my ($line) = @_; my $ind = index($line, " "); if ($ind != -1) { my $key = substr($line, 0, $ind); my $value = substr($line, $ind + 1); return ($key, $value); } } sub getSenderName() { my ($from_line) = @_; my $ind = index($from_line, ":"); if ( $ind != "-1" ) { $sender = substr($from_line, $ind + 1); $sender =~ s/^\s+//; #remove leading spaces $sender =~ s/\s+$//; #remove trailing spaces return $sender; } } sub getRecipientName() { my ($to_line) = @_; my $ind = index($to_line, ":"); if ( $ind != "-1" ) { $recipient = substr($to_line, $ind + 1); $recipient =~ s/^\s+//; #remove leading spaces $recipient =~ s/\s+$//; #remove trailing spaces return $recipient; } } sub doLog() { my ($msg, $line) = @_; print LOG "$msg: $line\n"; } sub usage() { print "Usage:\n"; print " cmd: perl <path to>dmining.pl <directory-name>\n"; print " where:\n"; print " directory-name: the absolute or relative path to raw data +\n"; }
And here is the phone log format, where the first partyID is the caller and second partyID is the receiver:
............................ 08:17:12.245 ( 1528: 4112) [TMS] AddMediaStreamFromCDS : Media Stream +Event from Remote Server is logged hr=0x00000000 08:17:12.245 ( 1528: 4132) [TMS] CNccCdrLog::ProcessMediaStreamRequest +: 0 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID= +"3855", CtrlPartyID="" 08:17:17.843 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="002000199994e4ea9030010491ae13d", partyID= +"OutOfArea", CtrlPartyID="" 08:17:17.844 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOfL +ist: after 3 entries, hr=0xC1170A2E 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID= +"2863", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - valida +ting CDR privacy for Call="00b0001288b4da335dc0010491ad632", partyID= +"2717", CtrlPartyID="" 08:17:21.479 ( 1528: 4132) [TMS] CCallEntryMsg::WriteCallData - EndOfL +ist: after 3 entries, hr=0xC1170A2E 08:17:21.640 ( 1528: 4112) [TMS] NccDtcpMsgLegState : Processing Le +g State Event ...........................

Any help would be greatly appreciated. i might have taken on a little too much here it seems.

Thank you for any help!


In reply to Redoing a script by PhiThors

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.