comment on

Update: Increased chunk size to 400.

Below, a parallel version with chunking enabled for the solution provided by monk Laurent_R. I ran against an input file containing 500k records.

Serial: 2.574 seconds. Parallel: 0.895 seconds, which includes the time to fork and reap children under a Unix environment. Afterwards, the output contains 500k lines.

The test machine is a 2.6 GHz Haswel Core i7 with RAM at 1600 MHz.

Optionally, the script can receive the input_file and output_file as arguments.

use strict;
use warnings;

use MCE::Loop;
use MCE::Candy;

my $input_file  = shift || 'input.txt';
my $output_file = shift || 'output.txt';

open my $ofh, ">", $output_file
   or die "cannot open '$output_file' for writing: $!\n";

MCE::Loop::init {
   use_slurpio => 1, chunk_size => 400, max_workers => 4,
   gather => MCE::Candy::out_iter_fh($ofh),
   RS => "\nINTERPOLATED HYDROGRAPH",
};

## Each worker receives many records determined by chunk_size.
## Output order is preserved via MCE::Candy::out_iter_fh

mce_loop_f {
   my ( $mce, $chunk_ref, $chunk_id ) = @_;

   open my $ifh, "<", $chunk_ref;
   my $output = "";

   while ( my $line = <$ifh> ) {
      chomp $line; # remove newline character from end of line
      if ( $line =~ /INTERPOLATED HYDROGRAPH AT (\w+)$/ ) {
         $output .= $1;
         $line = <$ifh> for 1..6; # skip 5 lines
         my $val2 = (split / /, $line)[1]; # get the second column
         $output .= " $val2";
         $line = <$ifh> for 1..2; # skip one line
         chomp $line;
         my $val3 = (split / /, $line)[-1]; # get the last column
         $output .= " $val3\r\n";
      }
   }

   close $ifh;

   MCE->gather( $chunk_id, $output );

} $input_file;

close $ofh;
[download]

Kind regards, Mario.

In reply to Re: Perl solution for current batch file to extract specific column text by marioroy
in thread Perl solution for current batch file to extract specific column text by oryan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.