Foodeywo:

You don't show the script in one chunk, so I can't tell if you've got a logic error or not. But I hacked a quickie together, and threw it at a large file (32GB), and it run in a little over 3 minutes:

#!/usr/bin/env perl # # search a large file for lines containing a regex # use strict; use warnings; use Data::Dump 'pp'; my @rexlist; my $cnt=0; while (<DATA>) { next if /^\s*($|#)/; s/\s+$//; my ($name, $rex) = split /:/, $_; my $regex = qr/$rex/; ++$cnt; open my $FH, '>', "FILESRCH.$cnt" or die $!; push @rexlist, [ $regex, $name, $FH ]; } open my $IFH, '<', "a_big_file" or die "$!"; $cnt =0; my %cnts; my $lines=0; my $start = time; while (my $line = <$IFH>) { ++$cnt; ++$lines; if ($lines % 100000 == 0) { my $secs = time - $start; print "$lines: $secs s\n"; } #last if $cnt>50; #print "$.: $line"; my $matches = 0; for my $r (@rexlist) { my ($rex, $name, $OFH) = @$r; if ($line =~ $rex) { print $OFH $line; #print "match $matches ($name)\n"; ++$cnts{$name}; } ++$matches; } #print "\n"; } print pp(\%cnts); __DATA__ aNumber:'\d+' CorporateRecord:'CORPORATE' null:NULL oldRec:'200[0-3]-\d\d-\d\d newRec:'20?[4-9]-\d\d-\d\d newRec2: '201\d-\d\d-\d\d

I can only imagine that you have a logic error, or some particularly slow regexes to make your program run that slowly. The output from mine:

$ time perl large_file_regex_search.pl 100000: 1 s 200000: 2 s 300000: 3 s 400000: 4 s 500000: 5 s 600000: 5 s 700000: 6 s 800000: 7 s 900000: 8 s 1000000: 10 s 1100000: 15 s 1200000: 18 s 1300000: 20 s 1400000: 23 s 1500000: 25 s 1600000: 29 s 1700000: 35 s 1800000: 42 s 1900000: 47 s 2000000: 53 s 2100000: 60 s 2200000: 66 s 2300000: 71 s 2400000: 75 s 2500000: 81 s 2600000: 87 s 2700000: 92 s 2800000: 98 s 2900000: 103 s 3000000: 107 s 3100000: 113 s 3200000: 119 s 3300000: 124 s 3400000: 129 s 3500000: 135 s 3600000: 142 s 3700000: 151 s 3800000: 158 s 3900000: 166 s 4000000: 173 s 4100000: 181 s { aNumber => 4140847, CorporateRecord => 149943, newRec2 => 783275, null => 4140847, oldRec => 987898, } real 3m5.660s user 1m6.390s sys 0m16.875s $ $ ls -al FI* -rw-r--r-- 1 Roboticus None 1261 May 30 12:12 FILES.ddl.sql -rw-r--r-- 1 Roboticus None 3248770142 Jul 21 08:47 FILESRCH.1 -rw-r--r-- 1 Roboticus None 116430098 Jul 21 08:47 FILESRCH.2 -rw-r--r-- 1 Roboticus None 3248770142 Jul 21 08:47 FILESRCH.3 -rw-r--r-- 1 Roboticus None 769188466 Jul 21 08:47 FILESRCH.4 -rw-r--r-- 1 Roboticus None 0 Jul 21 08:44 FILESRCH.5 -rw-r--r-- 1 Roboticus None 613214364 Jul 21 08:47 FILESRCH.6

At first, I thought that perhaps your ranges were too large and you were doing a lot of disk writing (which may be true), but two of the expressions in my list are on every input line, so FILESRCH.1 and FILESRCH.3 are exact copies of the input file. Post your entire script and some sample regexes so we can see where the difficulty lies.

...roboticus

When your only tool is a hammer, all problems look like your thumb.


In reply to Re: Write to multiple files according to multiple regex by roboticus
in thread Write to multiple files according to multiple regex by Foodeywo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.