comment on

I have this script I wrote to convert from one ASCII file format to another.

I have done the coding so bad that I get only 10kb/min output :-(

I tried to Dprof the code, but it seems unless you make the perl code terminate normally, dprof output is not of much use.

I use Tie::File to open the source file as well as create the output file.

Here is the code:

#!/usr/bin/perl

# top stats
#
# CPU TTY     PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %
+CPU COMMAND
#  3 pts/12 13833 me   241 20 30444K 18760K run      0:13 52.90 39.85 
+perl
#  3 pts/12 13833 me   241 20 45036K 33356K run      0:27 62.02 56.93 
+perl
#  0 pts/12 13833 me   241 20 48748K 37116K run      1:34 78.13 77.99 
+perl
#  3 pts/12 13833 me   241 20 53996K 42364K run      5:40 71.00 70.88 
+perl
#  3 pts/12 13833 me   241 20 72172K 60460K run     44:38 72.95 72.83 
+perl
# 
# Some file stats
# 
# -rw-r--r--   1 me     mine        2100352 May 12 11:47 mineOut
# -rw-r--r--   1 me     mine         221005 May 12 12:56 mineoutput.co
+nverted.to.other
# 
# -rw-r--r--   1 me     mine        2100352 May 12 11:47 mineOut
# -rw-r--r--   1 me     mine         239670 May 12 12:57 mineoutput.co
+nverted.to.other
# 
# -rw-r--r--   1 me     mine        2100352 May 12 11:47 mineOut
# -rw-r--r--   1 me     mine         261315 May 12 12:58 mineoutput.co
+nverted.to.other
# 
# -rw-r--r--   1 me     mine        2100352 May 12 11:47 mineOut
# -rw-r--r--   1 me     mine         989435 May 12 13:59 mineoutput.co
+nverted.to.other
#
# Thoroughput is around 18665 bytes to 21645 bytes per min -> ~20kb/mi
+n
# At this rate  2100352 bytes output will take 113 mins !
# Reality check : 728120 bytes in 1 hour (from 12:58 to 13:59) : 72812
+0 / 60 = 12135 bytes / min -> ~10kb/min   

use strict;
use warnings;

use Tie::File;

use Data::Dumper;

# open an existing file in read-only mode
use Fcntl 'O_RDONLY';

# Unfortunately it seems, mine and other field names are different. He
+nce, we create a map between the two and replace the mine field name 
+with the other one whereever available 
# This is how you do the mapping
# mine                                                                
+     <->                                     other
# If your mine and other field names are same, keep this mapping empty

our %fieldNameMapping = ();
#    qw(
#     MSC_CDR_TYPE                                                    
+                                             RECORD_TYPE
#     MSC_CDR_SEQ_NUM                                                 
+                                           callIdentificationNumber
#     MSC_CDR_REFER_NUM                                               
+                                         networkCallRef 
#     MSC_CALL_START_TIME                                             
+                                       start_date_time_format
#   MSC_CALL_DURATION                                                 
+                                       charge_duration_secs
#   MSC_PARTIAL_TYPE                                                  
+                                       msc_partial_type
#   AX_FIRST_CALLED_LOC_INFO                                          
+                              firstCalledLocInformation
#    );
  # Put the remaining fields

our @array;
tie @array, 'Tie::File', 'inp', memory => 50_000_000, mode => O_RDONLY
+, recsep => "\n" or die $!;

our @arrayOfother = ();
tie @arrayOfother, 'Tie::File', 'mineoutput.converted.to.other' or die
+ $!;

our $dx = 0;

our $recordID = 0;

our $recordHeader = undef;
our %recordBodyToWriteOut = ();
our $recordTrailer = undef;

for($dx = 0; $dx < @array; ++$dx)
{
    #if($array[$dx++] =~ /Level \(([0-9]+)\) "([^"]+)"/)
    if($array[1 + $dx] =~ /Level \(1\) "([^"]+)"$/)
    {
        if($array[2 + $dx] =~ /Level \(2\) "([^"]+)"$/)
        {
            if($array[3 + $dx] =~ /Record \(([0-9]+)\) "([^"]+)"$/)
            {
                $recordID = $1;
                print STDERR "[*]Got record type $2, number $recordID\
+n";
                
                # Write out the record in other format until we get en
+d of record
                
                $dx += 3;
                
                #print "RECORD\n";
                $recordHeader = "RECORD\n"; # First value in the heade
+r
                
                $recordHeader .= "#addkey\n#filename FF\n#input_id 001
+\n";
                
                %recordBodyToWriteOut = (); # Reset the record body

                    do
                    {
                        if($array[$dx++] =~ /"([^"]+)" = "([^"]+)"$/)
                        {
                            if($1 eq 'MSC_CDR_TYPE')
                            {
                                $recordHeader .= "#input_type $2\n#out
+put_id\n#output_type $2\n#source_id SRC\n";
                            }
                            
                            if(exists($fieldNameMapping { $1 }))
                            {
                                #print "F " . $fieldNameMapping { $1 }
+ . " $2\n";
                                $recordBodyToWriteOut { $fieldNameMapp
+ing { $1 } } = $2; 
                            }
                            else
                            {
                                #print "F $1 $2\n";
                                $recordBodyToWriteOut { $1 } = $2;
                            }
                        }
                    }

                    until(
                        ($array[1 + $dx] =~ /End of Record \(${recordI
+D}\)$/)
                        &&
                        ($array[2 + $dx] =~ /End of Level \(2\)$/)
                        &&
                        ($array[3 + $dx] =~ /End of Level \(1\)$/)
                    );

                    $recordTrailer = ".\n";  # First value in the Trai
+ler
                    
                    $dx += 2;
                    
                    # Now write out the header, fields and trailer
                    #print $recordHeader;
                    push @arrayOfother, $recordHeader;

                    # We want the fields to come out in sorted order
                                        
                    foreach my $key (sort keys %recordBodyToWriteOut)
                    {
                        #print "F $key " . $recordBodyToWriteOut { $ke
+y } . "\n";
                        push @arrayOfother, "F $key " . $recordBodyToW
+riteOut { $key } . "\n"; 
                    }

                    #print $recordTrailer;
                    push @arrayOfother, $recordTrailer;
            }
        }
    }
}
[download]

Here is some input data:

Start of Data
**********************************************************************
Level (1) "COMMONRec"
   Level (2) "MSCCDR"
      Record (1) "MSCGSMRec"
         "MSC_CDR_TYPE" = "MOC"
         "MSC_CALL_START_TIME" = "20090122105929"
         "MSC_CALL_END_TIME" = "20090122105944"
         "MSC_CALL_DURATION" = "15"
         "MSC_PARTIAL_INDICATOR" = "S"
         Sub Record (1) "AXECallDataRecord"
            "AX_DISCONNECT_PARTY" = "1"
            "AX_CHARGED_PARTY" = "0"
            "AX_TRANSLATED_TON" = "1"
         End of Sub Record (1)
      End of Record (1)
   End of Level (2)
End of Level (1)
Level (1) "COMMONRec"
   Level (2) "MSCCDR"
      Record (2) "MSCGSMRec"
         "MSC_CDR_TYPE" = "MTC"
         "MSC_PARTIAL_TYPE" = "0"
         "MSC_CALL_START_TIME" = "20090122105927"
         "MSC_CALL_END_TIME" = "20090122105945"
         "MSC_CALL_DURATION" = "18"
         "MSC_PARTIAL_INDICATOR" = "S"
         Sub Record (1) "AXECallDataRecord"
            "AX_DISCONNECT_PARTY" = "1"
            "AX_CHARGED_PARTY" = "0"
            "AX_SWITCH_IDENTITY" = "0001"
            "AX_RELATED_NUMBER" = "7F4595"
            "AX_FIRST_CALLED_LOC_INFO" = "25F010233203BE"
         End of Sub Record (1)
      End of Record (2)
   End of Level (2)
End of Level (1)
[download]

Here is some output:

RECORD
#addkey
#filename FF
#input_id 001
#input_type MOC
#output_id
#output_type MOC
#source_id SRC
F AX_CHARGED_PARTY 0
F AX_DISCONNECT_PARTY 1
F AX_TRANSLATED_TON 1
F MSC_CALL_DURATION 15
F MSC_CALL_END_TIME 20090122105944
F MSC_CALL_START_TIME 20090122105929
F MSC_CDR_TYPE MOC
F MSC_PARTIAL_INDICATOR S
.
RECORD
#addkey
#filename FF
#input_id 001
#input_type MTC
#output_id
#output_type MTC
#source_id SRC
F AX_CHARGED_PARTY 0
F AX_DISCONNECT_PARTY 1
F AX_FIRST_CALLED_LOC_INFO 25F010233203BE
F AX_RELATED_NUMBER 7F4595
F AX_SWITCH_IDENTITY 0001
F MSC_CALL_DURATION 18
F MSC_CALL_END_TIME 20090122105945
F MSC_CALL_START_TIME 20090122105927
F MSC_CDR_TYPE MTC
F MSC_PARTIAL_INDICATOR S
F MSC_PARTIAL_TYPE 0
.
[download]

Is this completely hopeless? Should I not use a hash to store the field-value pairs? Should I not use Tie::File and store the file contents in the array?

Any other optimizations you can suggest? The error message "Use of uninitialized value within @array in pattern match (m//) .. at line 75" can always be done last.

In reply to Improving dismal performance - Part 1 by PoorLuzer

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.