in reply to Improving dismal performance - Part 1

There are several reasons why this script is slow and Tie::Array is only one of them.

By keeping a bit of state and storing field values in a hash as you find them, you can completely eliminate the need to use an array and most of the internal if/else statements and loops as well. Here is much simplified version of your parser:

use strict; use warnings; sub printRecord; #-------------------------------------------------- # Parsing loop #-------------------------------------------------- my $fhOut = \*STDOUT; my $iLevel=0; my %hFields; while (my $sLine = <DATA>) { #if line defines the level, set level if ($sLine =~ /^\s*(?:Level|Record|Sub Record)\s+\(\d+\)/) { $iLevel++; } elsif ($sLine =~ /^\s*End of/) { $iLevel--; } else { my ($k, $v) = $sLine =~ /\s+\"(\w+)\"\s+=\s+\"([^"]*)\"/; $hFields{$k}=$v; } #if level back to 0, dump record if ($iLevel == 0) { printRecord($fhOut, \%hFields); %hFields=(); } } #-------------------------------------------------- # SUBROUTINE DEFINITIONS #-------------------------------------------------- sub printRecord { my ($fhOut, $hFields) = @_; my $sIOType = $hFields->{MSC_CDR_TYPE}; print $fhOut "RECORD\n"; print $fhOut "#addkey\n"; print $fhOut "#filename FF\n"; print $fhOut "#input_id 001\n"; print $fhOut "#input_type $sIOType\n"; print $fhOut "#output_id\n"; print $fhOut "#output_type $sIOType\n"; print $fhOut "#source_id SRC\n"; foreach my $k (sort keys %$hFields) { my $v = $hFields->{$k}; print $fhOut "F $k $v\n"; } print $fhOut ".\n"; } #cut and paste sample data from above __DATA__

Replies are listed 'Best First'.
Re^2: Improving dismal performance - Part 1
by PoorLuzer (Beadle) on May 12, 2009 at 23:03 UTC
    Hi!

    Excellent analysis, thanks!

    Well I will try out all your suggestions and complete code later today, but one MAJOR change I did was to just change the output file from using Tie::File to normal file IO (using open)

    .. and the performance got multiplied by approx 7 - 10 times.

    This of course becomes apparent once you look at the profiler output I pasted above.

    I ran the profiler on a file having some 14k "records" (839754 lines) and the tmon results:

    Total Elapsed Time = 456.5656 Seconds User+System Time = 206.2156 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 20.2 41.73 41.732 167950 0.0000 0.0000 Tie::File::_read_record 14.0 28.96 154.93 157920 0.0000 0.0001 Tie::File::_fetch 12.4 25.71 180.65 157920 0.0000 0.0001 Tie::File::FETCH 10.2 21.16 35.610 157920 0.0000 0.0000 Tie::File::Cache::lookup 7.16 14.77 53.769 839753 0.0000 0.0001 Tie::File::Cache::insert 5.64 11.62 12.893 839755 0.0000 0.0000 Tie::File::_seek 5.60 11.54 38.998 839753 0.0000 0.0000 Tie::File::Heap::insert 5.33 10.99 10.991 839753 0.0000 0.0000 Tie::File::Cache::_heap_m +ove 5.13 10.57 23.874 839753 0.0000 0.0000 Tie::File::Heap::_insert_ +new 3.30 6.796 9.688 739447 0.0000 0.0000 Tie::File::Heap::promote 3.25 6.701 24.732 1 6.7008 24.731 Tie::File::_fill_offsets 3.14 6.474 6.474 157920 0.0000 0.0000 Tie::File::Heap::_nseq 2.31 4.756 14.444 739447 0.0000 0.0000 Tie::File::Heap::lookup 1.12 2.311 2.311 839753 0.0000 0.0000 Tie::File::Heap::_nelts_i +nc 0.62 1.271 1.271 839760 0.0000 0.0000 Fcntl::__ANON__