in reply to How to manage a pattern matching & counting with big data file

please add this to your question:
1. A few lines of an orignial file with changed essential data. So we won't steal anything.
2. What exactly do you want to extract out of it.
3. How exactly do you want to store the results.
Thank you.
UPDATE
So you start with file containing this blocks of code
Startpoint: sdram_clk (clock source 'SDRAM_CLK') Endpoint: sd_DQ_out[6] (output port clocked by SD_DDR_CLK) Path Group: COMBO Path Type: max Point Fanout Cap Tra +ns Incr Path -------------------------------------------------------------------- +-------------------------- clock SDRAM_CLK (fall edge) + 3.750000 3.750000 sdram_clk (in) 0.1849 +22 0.065438 & 3.815438 f sdram_clk (net) 17 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/sdram_clk (SDRAM_TOP) + 0.000000 3.815438 f I_SDRAM_TOP/sdram_clk (net) 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (SDRAM_IF) + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk (net) 0.124019 + 0.000000 3.815438 f I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/I (bufbd7) 0.1878 +10 0.013919 & 3.829357 f I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/Z (bufbd7) 0.2331 +13 0.210904 & 4.040261 f I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 (net) 45 0.175550 + 0.000000 4.040261 f I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/S (mx02d4) 0.2340 +98 0.003310 & 4.043571 f I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/Z (mx02d4) 0.9991 +21 0.776377 4.819948 f I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (net) 1 0.475020 + 0.000000 4.819948 f I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] (SDRAM_IF) + 0.000000 4.819948 f I_SDRAM_TOP/sd_DQ_out[6] (net) 0.475020 + 0.000000 4.819948 f I_SDRAM_TOP/sd_DQ_out[6] (SDRAM_TOP) + 0.000000 4.819948 f sd_DQ_out[6] (net) 0.475020 + 0.000000 4.819948 f sd_DQ_out[6] (out) 0.9991 +21 0.010237 & 4.830185 f data arrival time + 4.830185 clock SD_DDR_CLK (rise edge) + 7.500000 7.500000 clock network delay (ideal) + 1.598546 9.098545 clock uncertainty + -0.100000 8.998545 output external delay + -2.000000 6.998545 data required time + 6.998545 -------------------------------------------------------------------- +-------------------------- data required time + 6.998545 data arrival time + -4.830185 -------------------------------------------------------------------- +-------------------------- slack (MET) + 2.168359
from which you want to extract this
clock SDRAM_CLK sdram_clk sdram_clk I_SDRAM_TOP/sdram_clk I_SDRAM_TOP/sdram_clk I_SDRAM_TOP/I_SDRAM_IF/sdram_clk I_SDRAM_TOP/I_SDRAM_IF/sdram_clk I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/I I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16/Z I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/S I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6/Z I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out[6] I_SDRAM_TOP/sd_DQ_out[6] I_SDRAM_TOP/sd_DQ_out[6] sd_DQ_out[6] sd_DQ_out[6]
which is a sequence, which you want find all of in your entire file and count them? Did I get it correctly?

Replies are listed 'Best First'.
Re^2: How to manage a big file
by taj_ritesh (Initiate) on Apr 23, 2014 at 09:42 UTC

    Now each line is a potential pattern & we need to make four more pattern by adding next two & previous two lines, for example :-

    Pattern (1):- I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 ( COnsider this as a seed pattern)

    Pattern (2):- I_SDRAM_TOP/I_SDRAM_IF/sdram_clk I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16 I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 ( Seed +2)

    Pattern (3):- I_SDRAM_TOP/I_SDRAM_IF/bufbdf_G1B1I16 I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 (seed +1)

    Pattern (4):- I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6 (seed -1)

    Pattern (5):- I_SDRAM_TOP/I_SDRAM_IF/sdram_clk_G1B1I16 I_SDRAM_TOP/I_SDRAM_IF/sd_mux_dq_out_6 I_SDRAM_TOP/I_SDRAM_IF/sd_DQ_out6 (seed -2)

    Now these five patterns needs to scan in the master file & print each pattern wise occurrence count.

      while thinking about your last addition,
      consider this
      open IN,'<1'; while (<IN>) { $a = 1 if /Endpoint/; $b = 1 if /-/ and $a; if($a and $b){ if(/^\s+(.+)\s\(/){ $h{$1}{'x'} = ++$x if ! exists $h{$1}; $r .= $h{$1}{'x'}."."; } } $a = 0, $b = 0, ++$r{$r}, $r = '' if /data arrival time/ and $ +b; } close IN; foreach (sort keys %r) { print "$_: $r{$_}\n"; }
      algorithm:
      find /Endpoint/
      find /-/
      assign every line pattern a unique code from 1 towards infinity
      form the unique string number, count it as one occurence
      find /data arrival time/
      which will result in something like this:
      1.2.2.3.3.4.4.5.6.7.8.9.10.10.11.11.12.12.: 8
      which means I copied your example file eight times in my '1' file
      update
      regarding your patterns:
      they are quite confusing
      why did you pick up the middle line to form your pattern?
      Please consider this snippet above, maybe it will suit your needs as well?
      update 2
      have you tried this soultion? has it worked? please tell.
        FWIW, $a and $b are special, its better to use meaningful variable names not already reserved