in reply to Using the second word in a line to split a file into multiple files

There are unfortunately several issues with your code:

If I fix these issues (keeping $filecount), I get:

use warnings; use strict; my $infn = 'testdat.dat'; open(my $infh, '<', $infn) or die "$infn: $!"; my $outfh; my $filecount = 0; while ( my $line = <$infh> ) { if ( $line =~ /^zone\s+(\w+)\W+\w+\s*$/ ) { close $outfh if $outfh; my $outfn = sprintf '%s-%d.txt', $1, ++$filecount; open($outfh, '>', $outfn) or die "$outfn: $!"; } if ($outfh) { print {$outfh} $line or die "print: $!"; } } close($outfh) if $outfh; close($infh);

However, this produces output files that are 7 lines or longer, because they include the endoffile marker and any lines following it. If you want the endoffile to be excluded from the output, the fix is fairly simple:

if ( $line =~ /^zone\s+(\w+)\W+\w+\s*$/ ) { ... } elsif ( $line =~ /^endoffile$/ ) { close $outfh; $outfh = undef; }

If you want something more complex than this, then it'd be better to switch to a state machine type approach, which I showed some templates for at the top of this node.

Replies are listed 'Best First'.
Re^2: Using the second word in a line to split a file into multiple files
by az1962 (Initiate) on Aug 26, 2019 at 14:35 UTC
    Hello, thank you so much for your help and guidance, I am very new to this forum. I was able to get this working, and you are correct, I had filecount in there from the code I was using to start out, and focused mainly on the regex problems. I was able to get it to do exactly what was needed on the test data I presented, then realized the reason it wasn't working on the actual data file I need to parse, is because I have .s in the second word, and that second word always ends with a dot. (.s) This works for the orignal test data:
    use warnings; use strict; my $infn = '/Users/azeller/Documents/Rogers_import/20190822_RR_export- +nrcmd.txt'; open(my $infh, '<', $infn) or die "$infn: $!"; my $outfh; my $filecount = 0; while ( my $line = <$infh> ) { if ( $line =~ /^zone\s+(\w+)\W+\w+\s*$/ ) { close $outfh if $outfh; my $outfn = sprintf '%sdb', $1; open($outfh, '>', $outfn) or die "$outfn: $!"; } if ($outfh) { print {$outfh} $line or die "print: $!"; } } close($outfh) if $outfh; close($infh);
    But it doesn't work for the actual data I am parsing. My data file format is actually more like the following:
    one 1file1.nest. 1ss record1a record1b record1c record 1d 2 record empty endoffile zone 2file2.egg. 1ss record1a record1b record1c record 1d 2 record empty endoffile

      Please use <code> tags to format your code and sample input and output.

      I have .s in the second word, and that second word always ends with a dot.

      Now might be a good time to look at perlretut, as jcb suggested, or perhaps perlrequick. The \w+ will only match Word characters (normally [a-zA-Z0-9_] plus Unicode "Word" characters), but not including the dot. Perhaps you want to say "word characters plus dot", i.e. [\w.]+, or simply "any non-whitespace characters", i.e. \S+.

      Update: Edited first sentence that was accidentally cut off.