comment on

Hi Monks. I have script to process a customized pattern generation & find number of occurrence a selected pattern..

I am trying this on a file with 15000 page file. It is getting an issue that it is running for a very long time to not getting complete.

Can you please check the my code & help me to fix this, so it can run quickly & get complete

#!/usr/bin/perl
###############################################
sub get_first_elements_of_string
{
    my @a = (split (' ' ,"$_[0]"));
    return $a[0];
}
###############################################
sub array_search {
    my ($elem, @arr) = @_;
    my $flag = -2;
    $mn = 0;
        foreach  $n (@arr)
    {
        if ($n eq $elem)
        {
                    $flag = $mn;
        last;
                }
        $mn++;
         }
    return $flag;
 }
###############################################

sub get_data_path_report
{
    my @spep = (m/ (\S+).*Endpoint: (\S+).*/msg);
    my $s_p = $spep[0];
    my $e_p = $spep[1];
    
    my @data_path = (m/ .*?Endpoint: .*?$s_p(.*?)$e_p.*?data arrival t
+ime/msg);
    my $data_path_length = @data_path;

    my @A = ();
    if($data_path_length == 1)
    {
        print "\npath:\t $s_p -----to------$e_p-----";
        print "size of data_path: $data_path_length\n";
        foreach $ele (@data_path)
        {
            print "ele------- $ele\n";
            my @data_path_elements = (split ('\n',$ele));
            my $l = @data_path_elements;
            print "----- Size of data path elements : $l\n";
            shift(@data_path_elements);
            foreach $x (@data_path_elements)
            {
                #@arr = (split (' ',$x));
                my $c = &get_first_elements_of_string($x);
            print "split--ele: $x\n";
                print "------\t$c\n";
                push (@A,$c);
            }
        }
        print "DATA-PATH-Elements: @A";
    #shift(@A);
    print "#### $#A\n";
    return @A;
    }
}
###############################################
sub get_patterns 
{
    my $sp = $_[0];
    my $ep = $_[0];
    foreach my $k (2 .. $#_)
    {
        if($_[$k] == -2)
        {
            next;
        }
        if($_[$k] == $ep +1)
        {
            $ep  = $_[$k];
        }
        else
        {
            $sp = $_[$k];
            $ep = $_[$k];
        }
    }
    
}
###############################################
#open(fh, "timing_report_1.txt");

open(fh, "tim_icc_dec12b");

$/ = "Startpoint:";
my @result = ();
while (<fh>)
{
    my @a = ();
    @a = &get_data_path_report ($_);
    my %seen = ();
    push (@result, @a);
    @result = grep { !$seen{$_}++ } @result;
    
}
shift(@result);
my $U_L = @result;
print "\n\nUNIQUE CELLS: @result ===== $U_L\n";
###############################################
my $k =0;
foreach $h (@result)
{
    print "====== $k -----> $h\n";
    $k++;

}


close(fh);
###############################################

my $u_l = @result;
my $i = 0;
my $max_score = -1;
my @score_board_matrix = ();
while($i < $u_l)
{
    my $score_board_column = "";
    my $j = 0;
    while($j<$u_l)
    {
        $score = 0;
        open(fh, "tim_icc_dec12b");

        $/ = "Startpoint:";
        while(<fh>)
        {
            my @d = &get_data_path_report ($_);
            if($#d >=1)
            {
                my $ss = &array_search("$result[$i]", @d);
                my $ee = &array_search("$result[$j]", @d);
                print " !!!! ### $ss ----- $result[$i] <----> $ee ----
+- $result[$j] #### !!!!!\n";
                if($ee == $ss+1)
                {
                    $score++;
                }
            }
        }
        close(fh);
        $score_board_column = $score_board_column." ".$score;
        if($score > $max_score)
        {
            $max_score = $score;
        } 
        print "\n---------------------------$result[$i] $result[$j]---
+-\t$score------> $score_board_column\n";
        $j++;
    }
    $i++;
    push (@score_board_matrix, $score_board_column);
    print "\n";
}
###############################################
print "######################\n@score_board_matrix\nMAX_SCORE: $max_sc
+ore\n";
my $row = 0;
my @array_indexes = ();
foreach $column (@score_board_matrix)
{
my @re = split(" ",$column);
my $index = &array_search($max_score,@re);
print "\n----- $column------> $#re ------->$row,$index----> $result[$r
+ow]<------>$result[$index]\n";
push (@array_indexes, "$index");
$row++;
}
print "%%%%%%%%%%%%%%%%%%\n";
print "@array_indexes";
my $sp = $array_indexes[0];
my $ep = $array_indexes[0];
print "## $sp ---- $ep\n";
shift(@array_indexes);
print @array_indexes;
    foreach $k (@array_indexes)
    {
        if($k == -2)
        {
            next;
        }
        if($k == $ep +1)
        {
            $ep  = $k;
            next;
        }
        else
        {
        print "PATTERN: $sp ---- $ep";
        $e = $sp;
        while ($e <= $ep)
        {
        print "$result[$e]-->";
        $e++;
        }
            $sp = $k;
            $ep = $k;
        }
    
    }
[download]

Let me add some detail to make this issue more clear.. The code is getting stuck at following section



my $u_l = @result;
my $i = 0;
my $max_score = -1;
my @score_board_matrix = ();
while($i < $u_l)
{
    my $score_board_column = "";
    my $j = 0;
    while($j<$u_l)
    {
        $score = 0;
        open(fh, "tim_icc_dec12b");

        $/ = "Startpoint:";
        while(<fh>)
        {
            my @d = &get_data_path_report ($_);
            if($#d >=1)
            {
                my $ss = &array_search("$result[$i]", @d);
                my $ee = &array_search("$result[$j]", @d);
                print " !!!! ### $ss ----- $result[$i] <----> $ee ----
+- $result[$j] #### !!!!!\n";
                if($ee == $ss+1)
                {
                    $score++;
                }
            }
        }
        close(fh);
        $score_board_column = $score_board_column." ".$score;
        if($score > $max_score)
        {
            $max_score = $score;
        } 
        print "\n---------------------------$result[$i] $result[$j]---
+-\t$score------> $score_board_column\n";
        $j++;
    }
    $i++;
    push (@score_board_matrix, $score_board_column);
    print "\n";
}
[download]

One sample of the target data file is as following :-


  Startpoint: sdram_clk (clock source 'SDRAM_CLK')
  Endpoint: sd_DQ_out[6]
            (output port clocked by SD_DDR_CLK)
  Path Group: COMBO
  Path Type: max

  Point                                       Fanout       Cap     Tra
+ns      Incr       Path
  --------------------------------------------------------------------
+--------------------------
  clock SDRAM_CLK (fall edge)                                         
+    3.750000   3.750000
  sdram_clk (in)                                                0.1849
+22  0.065438 & 3.815438 f
  sdram_clk (net)                              17     0.124019        
+    0.000000   3.815438 f
  I_SDRAM_TOP/sdram_clk (SDRAM_TOP)                                   
+    0.000000   3.815438 f
  I_SDRAM_TOP/sdram_clk (net)                         0.124019        
+    0.000000   3.815438 f
  I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (SDRAM_IF)                         
+    0.000000   3.815438 f
  I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (net)              0.124019        
+    0.000000   3.815438 f
  I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/I (bufbd7)              0.1878
+10  0.013919 & 3.829357 f
  I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/Z (bufbd7)              0.2331
+13  0.210904 & 4.040261 f
  I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 (net)    45 0.175550       
+    0.000000   4.040261 f
  I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/S (mx02d4)             0.2340
+98  0.003310 & 4.043571 f
  I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/Z (mx02d4)             0.9991
+21  0.776377   4.819948 f
  I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (net)     1     0.475020        
+    0.000000   4.819948 f
  I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (SDRAM_IF)                      
+    0.000000   4.819948 f
  I_SDRAM_TOP/sd_DQ_out[6] (net)                      0.475020        
+    0.000000   4.819948 f
  I_SDRAM_TOP/sd_DQ_out[6] (SDRAM_TOP)                                
+    0.000000   4.819948 f
  sd_DQ_out[6] (net)                                  0.475020        
+    0.000000   4.819948 f
  sd_DQ_out[6] (out)                                            0.9991
+21  0.010237 & 4.830185 f
  data arrival time                                                   
+               4.830185

  clock SD_DDR_CLK (rise edge)                                        
+    7.500000   7.500000
  clock network delay (ideal)                                         
+    1.598546   9.098545
  clock uncertainty                                                   
+    -0.100000  8.998545
  output external delay                                               
+    -2.000000  6.998545
  data required time                                                  
+               6.998545
  --------------------------------------------------------------------
+--------------------------
  data required time                                                  
+               6.998545
  data arrival time                                                   
+               -4.830185
  --------------------------------------------------------------------
+--------------------------
  slack (MET)                                                         
+               2.168359
[download]

This work I am trying to scan path from start from "Endpoint" to "data arrival time" & then break first column for each line in this section in one pattern, expand this pattern in +/- 2 range the scan each pattern in whole file which is having 1000 such path as shown in example to find out number of times each pattern is getting repeated.

I have tried the suggested the work around, but it is not working, can you please help me with correct set of code to resolve this issue..

In reply to How to manage a pattern matching & counting with big data file by taj_ritesh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.