Luken8r has asked for the wisdom of the Perl Monks concerning the following question:

I have a large file that I am looking to parse data out of. The file is a compilation of recorded data split up into 25 lines of code per block and there are thousands of these blocks in the file. Im looking for a good way to split up this file into equal sized arrays. The data looks like this:
START DATA 3 0.0 0 3 5 6 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 4 5 4 4 5 6 7 7 7 3 4 3 4 5 6 4 5 6 .... END DATA
The lines inside of the blocks are not always equal in length nor always integers or positive numbers, but always the same number of lines between START/END
Optimally, what I would like is an array of this block and each line of this block in its own array so I can parse the elements out of the block that I need for a csv for a scatter plot graph. So really, what I would want is something like this:
@arrayblock = [ @line1[], @line2[], @line3[], etc... ]
But my perl knowledge isnt that great so Im stuck for ideas on how to get to this point. Any help would be appreciated

Replies are listed 'Best First'.
Re: Looking for a good way to split a file into equal sized arrays
by johngg (Canon) on Nov 26, 2007 at 14:30 UTC
    You can read each block of lines (regardless of how many lines there are) by setting the input record separator to "END DATA\n" so that each read operation reads multiple lines. Then each block can be split on newline.

    use strict; use warnings; use Data::Dumper; open my $inFH, q{<}, \ <<END_OF_FILE or die qq{open: $!\n}; START DATA 1 2 3 4 5 6 7 a bb ccc ddd 9 8 7 6 5 END DATA START DATA 4.9 5.3 9.1 biff baff boff END DATA END_OF_FILE my @block = (); local $/ = qq{END DATA\n}; while ( <$inFH> ) { my @lines = split m{\n}; push @block, [ @lines[ 1 .. $#lines - 1 ] ]; } close $inFH or die qq{close: $!\n}; print Data::Dumper->Dumpxs( [ \ @block ], [ q{*block} ] );

    The output is:-

    @block = ( [ '1 2 3 4 5 6 7', 'a bb ccc ddd', '9 8 7 6 5' ], [ '4.9 5.3 9.1', 'biff baff boff' ] );

    I hope this is of use.

    Cheers,

    JohnGG

Re: Looking for a good way to split a file into equal sized arrays
by wfsp (Abbot) on Nov 26, 2007 at 14:42 UTC
      That flip flop solution looks like it may work out pretty well for me.
      my $infile = <STDIN>; print"Your input file is $infile\n\n"; my $outfile = "out.log"; open (INFILE, "<$infile")||die "no such file"; open (LOG, ">$outfile")||die "no such file"; while (<INFILE>) { if (/^Begin Targets/ .. /^End Targets/) { next unless /^\d/; push @data, [split /\s/]; } count++; }
      Im back on my way, thanks!
Re: Looking for a good way to split a file into equal sized arrays
by educated_foo (Vicar) on Nov 26, 2007 at 13:55 UTC
    First try to get one block parsed. split, [...] for creating array references, and push will probably be useful. After that, try to parse them all. Basically your program will alternate between two states, "inside a block" and "outside a block."
Re: Looking for a good way to split a file into equal sized arrays
by TheForeigner (Initiate) on Nov 26, 2007 at 15:27 UTC
    I agree that you need a flag to determine whether or not you are in a block, but beyond that, you need a datatype to hold all your data. I think you want an array of blocks that contains an array of lines that consists of an array of elements. This ought to do the trick:
    use Data::Dumper; sub readFile{ open my $fh, "resource/datafile.txt" || die "Cannot open: $!"; @lines = <$fh>; $inblock = 0; my @block = [];shift @block; while($#lines >= 0){ my $single = shift @lines; print $single; if($single =~ /END DATA/){ $inblock = 0; push @allblocks,[@block]; @block = [];shift @block; } push @block,[split ' ',$single] if $inblock; $inblock = 1 if($single =~ /START DATA/); } } readFile; print Dumper(\@allblocks);
    When a block starts the flag begins, and when a block ends the block is added to @allblocks. I don't consider myself a monk yet, but I tried the code and it works.
      Thanks for all the suggestions, folks. While that flip flop did work out as anticipated, it turns out thats not exactly what I thought I needed
      What I ended up doing was sending all the data in the file to an array then incrementing the elements of that array in a for loop. Inside that for loop, I set a few triggers via a switch. Since the beginning value of each line tells me what sort of data it contains, I then split each line depending on what it was:
      {{{vars}}} my $infile = <STDIN>; print"Your input file is $infile\n\n"; open (INFILE, "<$infile")||die "no such file"; while (<INFILE>) { @data = <INFILE>; for ($x = 0; $x<=$#data; $x++){ chomp ($data[$x]); switch ($data[$x]){ # If start of section, increment array by 3 to skip header # case /^Begin Targets/ { $x += 3; } case /^0 / { # do not want last; } case /^1 / { @line = split(/ /,$data[$x]); # Get data out last; } case /^2 / { @line = split(/ /,$data[$x]); # Get data out last; } case /^3 / { @line = split(/ /,$data[$x]); # Get data out last; } # end case3 } # end switch } # end for } #end
      In each case I can take the line and set it to some variable depending on which place in the field I want,
      $DistX[$x] = $line[2]; $DistY[$x] = $line[3];
      ...etc
      Well Im half way there, Now I just need to output this data to a csv spreadsheet to plot it...but thats for tomorrow :)