in reply to Re^5: Write to multiple files according to multiple regex
in thread Write to multiple files according to multiple regex

yes they are all created and empty.

the single match that is found is only found if the first block is a match. else everything is empty.

data looks like that:
UT 123456789 1234 9876 1234 some additional string information THE_END UT 987654321 1234 2345 some additional string information THE_END UT 1928374756 4321 2567 1234 THE_END some additional string information UT 5647382910 1234 2435 5678 some additional string information THE_END

notice I changed END to THE_END to make it more unique, since other lines may accidentially contain the string "END" and I cant use regex "^END"

the current code is:

#!perl use strict; use warnings; use FindBin; my $dir ="$FindBin::Bin/../rxo"; opendir(my $dh, $dir) || die "can't opendir $dir: $!"; my @inputs = readdir($dh); closedir $dh; splice @inputs, 0, 2; my @dispatch; foreach(@inputs) { my $outfile = "$FindBin::Bin/../blocks/$_"; open my $ofh, '>', $outfile || die; my $file = "$FindBin::Bin/../rxo/$_"; open my $fh, '<', $file || die;; my $regex = <$fh>; close $fh; push @dispatch, { file => $ofh, regex => qr/$regex/ }; } while(my $line = do { local $/ = 'THE_END'; <> }) { foreach (@dispatch) { print { $_->{file} } $line if $line =~ $_->{regex}; } }

Replies are listed 'Best First'.
Re^7: Write to multiple files according to multiple regex
by BillKSmith (Monsignor) on Jul 21, 2015 at 23:19 UTC
    Use $/ = "THE_END\n"; Your regex's are failing because every block but the first has a newline at the beginning. I should have thought of that sooner.
    Bill

      i tried. it still gives me no matches. i also tried to add another line "THE_END" at the top of my file and I tried "THE_END\n", "\nTHE_END", "\rTHE_END", "THE_END\r".

      i wonder if the regex - in the way they are written - are suitable for this $/ approach. they look like this

      (?^:^UT A19(?:7(?:0G990800007|6CQ89200006)|8(?:0JW32900007|2PN88100001 +)|90DD63700001))

      it only adresses "UT" and the "numbercode", nothing afterwards. so if everthing from UT xxxxx to THE_END is treated as one line, they maybe dont match?

        You probably have two problems which interact. First, you must parse the input file into blocks. Second, you must process each block with the regular expressions from the other files.

        Lets address just the parsing problem. My code (with $/ = "THE_END\n") correctly parses your sample data. If that sample is accurate, the code will parse real data. If there is a problem, as you have guessed, it almost certainly has to do with whitespace at the end on the block.

        Use a text editor on your copy of the sample data file. Verify that between the last data digit in one block and the first data digit of the next block we find "\nTHE_END\n" (and absolutely nothing else!). Do the same for live data. Check several pairs of blocks just to be sure. Let me know what happened.

        If we passed the previous test, You are almost certainly parsing correctly, but have a problem with the processing. Again, the problem very likely has to do with whitespace. I really cannot offer any more help without having a real regex and a data block that you expect to match it.

        Bill