Hi Monks. I have script to process a customized pattern generation & find number of occurrence a selected pattern..

I am trying this on a file with 15000 page file. It is getting an issue that it is running for a very long time to not getting complete.

Can you please check the my code & help me to fix this, so it can run quickly & get complete

#!/usr/bin/perl ############################################### sub get_first_elements_of_string { my @a = (split (' ' ,"$_[0]")); return $a[0]; } ############################################### sub array_search { my ($elem, @arr) = @_; my $flag = -2; $mn = 0; foreach $n (@arr) { if ($n eq $elem) { $flag = $mn; last; } $mn++; } return $flag; } ############################################### sub get_data_path_report { my @spep = (m/ (\S+).*Endpoint: (\S+).*/msg); my $s_p = $spep[0]; my $e_p = $spep[1]; my @data_path = (m/ .*?Endpoint: .*?$s_p(.*?)$e_p.*?data arrival t +ime/msg); my $data_path_length = @data_path; my @A = (); if($data_path_length == 1) { print "\npath:\t $s_p -----to------$e_p-----"; print "size of data_path: $data_path_length\n"; foreach $ele (@data_path) { print "ele------- $ele\n"; my @data_path_elements = (split ('\n',$ele)); my $l = @data_path_elements; print "----- Size of data path elements : $l\n"; shift(@data_path_elements); foreach $x (@data_path_elements) { #@arr = (split (' ',$x)); my $c = &get_first_elements_of_string($x); print "split--ele: $x\n"; print "------\t$c\n"; push (@A,$c); } } print "DATA-PATH-Elements: @A"; #shift(@A); print "#### $#A\n"; return @A; } } ############################################### sub get_patterns { my $sp = $_[0]; my $ep = $_[0]; foreach my $k (2 .. $#_) { if($_[$k] == -2) { next; } if($_[$k] == $ep +1) { $ep = $_[$k]; } else { $sp = $_[$k]; $ep = $_[$k]; } } } ############################################### #open(fh, "timing_report_1.txt"); open(fh, "tim_icc_dec12b"); $/ = "Startpoint:"; my @result = (); while (<fh>) { my @a = (); @a = &get_data_path_report ($_); my %seen = (); push (@result, @a); @result = grep { !$seen{$_}++ } @result; } shift(@result); my $U_L = @result; print "\n\nUNIQUE CELLS: @result ===== $U_L\n"; ############################################### my $k =0; foreach $h (@result) { print "====== $k -----> $h\n"; $k++; } close(fh); ############################################### my $u_l = @result; my $i = 0; my $max_score = -1; my @score_board_matrix = (); while($i < $u_l) { my $score_board_column = ""; my $j = 0; while($j<$u_l) { $score = 0; open(fh, "tim_icc_dec12b"); $/ = "Startpoint:"; while(<fh>) { my @d = &get_data_path_report ($_); if($#d >=1) { my $ss = &array_search("$result[$i]", @d); my $ee = &array_search("$result[$j]", @d); print " !!!! ### $ss ----- $result[$i] <----> $ee ---- +- $result[$j] #### !!!!!\n"; if($ee == $ss+1) { $score++; } } } close(fh); $score_board_column = $score_board_column." ".$score; if($score > $max_score) { $max_score = $score; } print "\n---------------------------$result[$i] $result[$j]--- +-\t$score------> $score_board_column\n"; $j++; } $i++; push (@score_board_matrix, $score_board_column); print "\n"; } ############################################### print "######################\n@score_board_matrix\nMAX_SCORE: $max_sc +ore\n"; my $row = 0; my @array_indexes = (); foreach $column (@score_board_matrix) { my @re = split(" ",$column); my $index = &array_search($max_score,@re); print "\n----- $column------> $#re ------->$row,$index----> $result[$r +ow]<------>$result[$index]\n"; push (@array_indexes, "$index"); $row++; } print "%%%%%%%%%%%%%%%%%%\n"; print "@array_indexes"; my $sp = $array_indexes[0]; my $ep = $array_indexes[0]; print "## $sp ---- $ep\n"; shift(@array_indexes); print @array_indexes; foreach $k (@array_indexes) { if($k == -2) { next; } if($k == $ep +1) { $ep = $k; next; } else { print "PATTERN: $sp ---- $ep"; $e = $sp; while ($e <= $ep) { print "$result[$e]-->"; $e++; } $sp = $k; $ep = $k; } }

Let me add some detail to make this issue more clear.. The code is getting stuck at following section

my $u_l = @result; my $i = 0; my $max_score = -1; my @score_board_matrix = (); while($i < $u_l) { my $score_board_column = ""; my $j = 0; while($j<$u_l) { $score = 0; open(fh, "tim_icc_dec12b"); $/ = "Startpoint:"; while(<fh>) { my @d = &get_data_path_report ($_); if($#d >=1) { my $ss = &array_search("$result[$i]", @d); my $ee = &array_search("$result[$j]", @d); print " !!!! ### $ss ----- $result[$i] <----> $ee ---- +- $result[$j] #### !!!!!\n"; if($ee == $ss+1) { $score++; } } } close(fh); $score_board_column = $score_board_column." ".$score; if($score > $max_score) { $max_score = $score; } print "\n---------------------------$result[$i] $result[$j]--- +-\t$score------> $score_board_column\n"; $j++; } $i++; push (@score_board_matrix, $score_board_column); print "\n"; }

One sample of the target data file is as following :-

Startpoint: sdram_clk (clock source 'SDRAM_CLK') Endpoint: sd_DQ_out[6] (output port clocked by SD_DDR_CLK) Path Group: COMBO Path Type: max Point Fanout Cap Tra +ns Incr Path -------------------------------------------------------------------- +-------------------------- clock SDRAM_CLK (fall edge) + 3.750000 3.750000 sdram_clk (in) 0.1849 +22 0.065438 & 3.815438 f sdram_clk (net) 17 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/sdram_clk (SDRAM_TOP) + 0.000000 3.815438 f I_SDRAM_TOP/sdram_clk (net) 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (SDRAM_IF) + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (net) 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/I (bufbd7) 0.1878 +10 0.013919 & 3.829357 f I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/Z (bufbd7) 0.2331 +13 0.210904 & 4.040261 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 (net) 45 0.175550 + 0.000000 4.040261 f I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/S (mx02d4) 0.2340 +98 0.003310 & 4.043571 f I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/Z (mx02d4) 0.9991 +21 0.776377 4.819948 f I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (net) 1 0.475020 + 0.000000 4.819948 f I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (SDRAM_IF) + 0.000000 4.819948 f I_SDRAM_TOP/sd_DQ_out[6] (net) 0.475020 + 0.000000 4.819948 f I_SDRAM_TOP/sd_DQ_out[6] (SDRAM_TOP) + 0.000000 4.819948 f sd_DQ_out[6] (net) 0.475020 + 0.000000 4.819948 f sd_DQ_out[6] (out) 0.9991 +21 0.010237 & 4.830185 f data arrival time + 4.830185 clock SD_DDR_CLK (rise edge) + 7.500000 7.500000 clock network delay (ideal) + 1.598546 9.098545 clock uncertainty + -0.100000 8.998545 output external delay + -2.000000 6.998545 data required time + 6.998545 -------------------------------------------------------------------- +-------------------------- data required time + 6.998545 data arrival time + -4.830185 -------------------------------------------------------------------- +-------------------------- slack (MET) + 2.168359

This work I am trying to scan path from start from "Endpoint" to "data arrival time" & then break first column for each line in this section in one pattern, expand this pattern in +/- 2 range the scan each pattern in whole file which is having 1000 such path as shown in example to find out number of times each pattern is getting repeated.

I have tried the suggested the work around, but it is not working, can you please help me with correct set of code to resolve this issue..


In reply to How to manage a pattern matching & counting with big data file by taj_ritesh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.