in reply to Re: Write to multiple files according to multiple regex
in thread Write to multiple files according to multiple regex

Thanks!

The code is huge, a little hard for me to understand every line. What I cannot figure out in your code is, how $OFH can write to different files. Its not defined anywhere is it?

My code changed a bit after the many suggestions here and now looks like that:

#!perl use strict; use warnings; use FindBin; my (@regex, $regex,$file,$outfile,$dir,$dh,@inputs,$inputs,@filehandle +s,$fh,$ofh); $dir ="$FindBin::Bin/../rxo"; opendir($dh, $dir) || die "can't opendir $dir: $!"; @inputs = readdir($dh); closedir $dh; splice @inputs, 0, 2; foreach(@inputs) { #localize the file glob, so FILE is unique to # the inner loop. local *FILE; local *OUTFILE; $file = "$FindBin::Bin/../rxo/$_"; $outfile = "$FindBin::Bin/../blocks/$_"; open(*FILE, "$file") || die; open(*OUTFILE, "> $outfile") || die; #push the typeglobe to the end of the array $fh = \*FILE; $ofh = \*OUTFILE; $regex = <$fh>; push(@regex,$regex); push(@filehandles,$ofh); } $/ = '^END$'; while(my $line = <>) { for my $i(0..$#inputs) { print {$filehandles[$i]} $line if $line =~ /$regex[$i]/; } }

My regexes look like this:

(?^:^UT A19(?:7(?:0G990800007|6CQ89200006)|8(?:0JW32900007|2PN88100001)|90DD63700001))

Basically the data is arranged in blocks like:

UT xxxxxx (some number), lets call this the entry
some data about the entry
some more data about the entry
END
UT xxxxx2 (next entry)
...

So i want to extract 1) all blocks if interest, 2) split these blocks in n files since these blocks relate to n different regexes

Replies are listed 'Best First'.
Re^3: Write to multiple files according to multiple regex
by roboticus (Chancellor) on Jul 21, 2015 at 20:29 UTC

    Foodeywo:

    Regarding your question how $OFH writes to different files. I do it by building an array containing: (1) The name of the regular expression, (2) the regular expression, and (3) the output file handle using this code:

    while (<DATA>) { . . . create $name, $rex and $FH . . . push @rexlist, [ $regex, $name, $FH ]; }

    Then as we process the input file, we scan through our regular expressions, and for each one, we pull the regex, name and output file handle out of the array:

    while (my $line = <$IFH>) { . . . # For each regular expression for my $r (@rexlist) { # Pull the regular expression, name and file handle out of our + array my ($rex, $name, $OFH) = @$r; # If the line matches the regex, write it to the file if ($line =~ $rex) { print $OFH $line; } } . . . }

    Feel free to ask again if you need a bit more clarification.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re^3: Write to multiple files according to multiple regex
by BillKSmith (Monsignor) on Jul 21, 2015 at 16:05 UTC

    I can suggest several improvements to the code you have posted.

    Declare all variables in the smallest possible scope. Your declaration of all variables at the start of the file largely defeats your use of strict.

    Lexical file handles are much easier to manage than globs.

    The three argument form of open would make the intention clearer.

    Storing your file data in an array of hashes rather than in parallel arrays probably would not make any difference in speed, but it would help your readers by keeping related data together.

    Store you regexes as regexes (use qr//) rather than strings. It is probably faster, and it certainly makes the intention clearer.

    Note: The $INPUT_RECORD_SEPARATOR is a string not a regex.

    UNTESTED

    #!perl use strict; use warnings; use FindBin; my $dir = "$FindBin::Bin/../rxo"; opendir( my $dh, $dir ) || die "can't opendir $dir: $!"; my @inputs = readdir($dh); closedir $dh; splice @inputs, 0, 2; my @dispatch; foreach (@inputs) { my $outfile = "$FindBin::Bin/../blocks/$_"; open my $ofh, '>', $outfile || die; my $file = "$FindBin::Bin/../rxo/$_"; open my $fh, '<', $file || die; my $regex = <$fh>; close $fh; push @dispatch, { file => $ofh, regex => qr/$regex/ }; } while ( my $line = do{ local $/ = 'END'; <> } ) { foreach (@dispatch) { print { $_->{file} } $line if $line =~ $_->{regex}; } }
    Bill
      thank you very much! this runs and is much faster. however I have problems with the $/. It stops after the first match was found. So i get 1 entry in 1 File, and the rest of the file remains empty.
        It stops after the first match was found. So i get 1 entry in 1 File, and the rest of the file remains empty.

        Are all the output files created, but empty? Are you sure that the one entry is correct? If your input is not broken into blocks correctly, the value of $/ is not correct. The scheme will not work if every end-of-block is not exactly the same. (Remember: $/ is a string) Please post a few (three to ten) blocks of realistic data. Use code tags so we can download it exactly. For security, you can use made-up data, but the format must be exact.

        Bill