I have this script I wrote to convert from one ASCII file format to another.

I have done the coding so bad that I get only 10kb/min output :-(

I tried to Dprof the code, but it seems unless you make the perl code terminate normally, dprof output is not of much use.

I use Tie::File to open the source file as well as create the output file.

Here is the code:

#!/usr/bin/perl # top stats # # CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU % +CPU COMMAND # 3 pts/12 13833 me 241 20 30444K 18760K run 0:13 52.90 39.85 +perl # 3 pts/12 13833 me 241 20 45036K 33356K run 0:27 62.02 56.93 +perl # 0 pts/12 13833 me 241 20 48748K 37116K run 1:34 78.13 77.99 +perl # 3 pts/12 13833 me 241 20 53996K 42364K run 5:40 71.00 70.88 +perl # 3 pts/12 13833 me 241 20 72172K 60460K run 44:38 72.95 72.83 +perl # # Some file stats # # -rw-r--r-- 1 me mine 2100352 May 12 11:47 mineOut # -rw-r--r-- 1 me mine 221005 May 12 12:56 mineoutput.co +nverted.to.other # # -rw-r--r-- 1 me mine 2100352 May 12 11:47 mineOut # -rw-r--r-- 1 me mine 239670 May 12 12:57 mineoutput.co +nverted.to.other # # -rw-r--r-- 1 me mine 2100352 May 12 11:47 mineOut # -rw-r--r-- 1 me mine 261315 May 12 12:58 mineoutput.co +nverted.to.other # # -rw-r--r-- 1 me mine 2100352 May 12 11:47 mineOut # -rw-r--r-- 1 me mine 989435 May 12 13:59 mineoutput.co +nverted.to.other # # Thoroughput is around 18665 bytes to 21645 bytes per min -> ~20kb/mi +n # At this rate 2100352 bytes output will take 113 mins ! # Reality check : 728120 bytes in 1 hour (from 12:58 to 13:59) : 72812 +0 / 60 = 12135 bytes / min -> ~10kb/min use strict; use warnings; use Tie::File; use Data::Dumper; # open an existing file in read-only mode use Fcntl 'O_RDONLY'; # Unfortunately it seems, mine and other field names are different. He +nce, we create a map between the two and replace the mine field name +with the other one whereever available # This is how you do the mapping # mine + <-> other # If your mine and other field names are same, keep this mapping empty our %fieldNameMapping = (); # qw( # MSC_CDR_TYPE + RECORD_TYPE # MSC_CDR_SEQ_NUM + callIdentificationNumber # MSC_CDR_REFER_NUM + networkCallRef # MSC_CALL_START_TIME + start_date_time_format # MSC_CALL_DURATION + charge_duration_secs # MSC_PARTIAL_TYPE + msc_partial_type # AX_FIRST_CALLED_LOC_INFO + firstCalledLocInformation # ); # Put the remaining fields our @array; tie @array, 'Tie::File', 'inp', memory => 50_000_000, mode => O_RDONLY +, recsep => "\n" or die $!; our @arrayOfother = (); tie @arrayOfother, 'Tie::File', 'mineoutput.converted.to.other' or die + $!; our $dx = 0; our $recordID = 0; our $recordHeader = undef; our %recordBodyToWriteOut = (); our $recordTrailer = undef; for($dx = 0; $dx < @array; ++$dx) { #if($array[$dx++] =~ /Level \(([0-9]+)\) "([^"]+)"/) if($array[1 + $dx] =~ /Level \(1\) "([^"]+)"$/) { if($array[2 + $dx] =~ /Level \(2\) "([^"]+)"$/) { if($array[3 + $dx] =~ /Record \(([0-9]+)\) "([^"]+)"$/) { $recordID = $1; print STDERR "[*]Got record type $2, number $recordID\ +n"; # Write out the record in other format until we get en +d of record $dx += 3; #print "RECORD\n"; $recordHeader = "RECORD\n"; # First value in the heade +r $recordHeader .= "#addkey\n#filename FF\n#input_id 001 +\n"; %recordBodyToWriteOut = (); # Reset the record body do { if($array[$dx++] =~ /"([^"]+)" = "([^"]+)"$/) { if($1 eq 'MSC_CDR_TYPE') { $recordHeader .= "#input_type $2\n#out +put_id\n#output_type $2\n#source_id SRC\n"; } if(exists($fieldNameMapping { $1 })) { #print "F " . $fieldNameMapping { $1 } + . " $2\n"; $recordBodyToWriteOut { $fieldNameMapp +ing { $1 } } = $2; } else { #print "F $1 $2\n"; $recordBodyToWriteOut { $1 } = $2; } } } until( ($array[1 + $dx] =~ /End of Record \(${recordI +D}\)$/) && ($array[2 + $dx] =~ /End of Level \(2\)$/) && ($array[3 + $dx] =~ /End of Level \(1\)$/) ); $recordTrailer = ".\n"; # First value in the Trai +ler $dx += 2; # Now write out the header, fields and trailer #print $recordHeader; push @arrayOfother, $recordHeader; # We want the fields to come out in sorted order foreach my $key (sort keys %recordBodyToWriteOut) { #print "F $key " . $recordBodyToWriteOut { $ke +y } . "\n"; push @arrayOfother, "F $key " . $recordBodyToW +riteOut { $key } . "\n"; } #print $recordTrailer; push @arrayOfother, $recordTrailer; } } } }

Here is some input data:

Start of Data ********************************************************************** Level (1) "COMMONRec" Level (2) "MSCCDR" Record (1) "MSCGSMRec" "MSC_CDR_TYPE" = "MOC" "MSC_CALL_START_TIME" = "20090122105929" "MSC_CALL_END_TIME" = "20090122105944" "MSC_CALL_DURATION" = "15" "MSC_PARTIAL_INDICATOR" = "S" Sub Record (1) "AXECallDataRecord" "AX_DISCONNECT_PARTY" = "1" "AX_CHARGED_PARTY" = "0" "AX_TRANSLATED_TON" = "1" End of Sub Record (1) End of Record (1) End of Level (2) End of Level (1) Level (1) "COMMONRec" Level (2) "MSCCDR" Record (2) "MSCGSMRec" "MSC_CDR_TYPE" = "MTC" "MSC_PARTIAL_TYPE" = "0" "MSC_CALL_START_TIME" = "20090122105927" "MSC_CALL_END_TIME" = "20090122105945" "MSC_CALL_DURATION" = "18" "MSC_PARTIAL_INDICATOR" = "S" Sub Record (1) "AXECallDataRecord" "AX_DISCONNECT_PARTY" = "1" "AX_CHARGED_PARTY" = "0" "AX_SWITCH_IDENTITY" = "0001" "AX_RELATED_NUMBER" = "7F4595" "AX_FIRST_CALLED_LOC_INFO" = "25F010233203BE" End of Sub Record (1) End of Record (2) End of Level (2) End of Level (1)

Here is some output:

RECORD #addkey #filename FF #input_id 001 #input_type MOC #output_id #output_type MOC #source_id SRC F AX_CHARGED_PARTY 0 F AX_DISCONNECT_PARTY 1 F AX_TRANSLATED_TON 1 F MSC_CALL_DURATION 15 F MSC_CALL_END_TIME 20090122105944 F MSC_CALL_START_TIME 20090122105929 F MSC_CDR_TYPE MOC F MSC_PARTIAL_INDICATOR S . RECORD #addkey #filename FF #input_id 001 #input_type MTC #output_id #output_type MTC #source_id SRC F AX_CHARGED_PARTY 0 F AX_DISCONNECT_PARTY 1 F AX_FIRST_CALLED_LOC_INFO 25F010233203BE F AX_RELATED_NUMBER 7F4595 F AX_SWITCH_IDENTITY 0001 F MSC_CALL_DURATION 18 F MSC_CALL_END_TIME 20090122105945 F MSC_CALL_START_TIME 20090122105927 F MSC_CDR_TYPE MTC F MSC_PARTIAL_INDICATOR S F MSC_PARTIAL_TYPE 0 .

Is this completely hopeless? Should I not use a hash to store the field-value pairs? Should I not use Tie::File and store the file contents in the array?

Any other optimizations you can suggest? The error message "Use of uninitialized value within @array in pattern match (m//) .. at line 75" can always be done last.


In reply to Improving dismal performance - Part 1 by PoorLuzer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.