in reply to Re^2: Perl solution for current batch file to extract specific column text
in thread Perl solution for current batch file to extract specific column text

I do appreciate the help on this issue, it has been wonderful. I have been intrigued by Perl and what it can do. It is definitely quite different from anything that I have used before. I did have to dive into the solutions that were provided a little, because it was not working right out of the gate for me, and it came down to leading spaces in a line of the input in front of "Interpolated Hydrograph". For some reason I left that off on the original post.

I modified that, added a header row, and removed the \r on the MCE->gather so there are no blank lines. The time difference is night and day:

Batch (~275k lines) = ~3 hours

Perl <1 second

The final code that works like a charm that I am using:

use strict; use warnings; use MCE::Loop; use MCE::Candy; my $input_file = shift || 'input.txt'; my $output_file = shift || 'output.txt'; my $match_string = " INTERPOLATED HYDROGRAPH A +T "; open my $ofh, ">", $output_file or die "cannot open '$output_file' for writing: $!\n"; print $ofh "HEC1_ID,Q100,V100\n"; MCE::Loop::init { use_slurpio => 1, chunk_size => 1, max_workers => 4, gather => MCE::Candy::out_iter_fh($ofh), RS => "\n${match_string}", }; ## Below, each worker receives one record at a time ## Output order is preserved via MCE::Candy::out_iter_fh ## line 1 CAC40 # INTERPOLATED HYDROGRAPH AT CAC40 ## line 2 # blank line here ## line 3 # PEAK FLOW TIME MAXIMUM AVERAGE FLOW ## line 4 # 6-HR 24-HR 72-HR 166.58-HR ## line 5 # + (CFS) (HR) ## line 6 # (CFS) ## line 7 1223. # + 1223. 12.67 890. 588. 245. 106. ## line 8 # (INCHES) .154 .408 .509 .509 ## line 9 1456. # (AC-FT) 441. 1166. 1456. 1456. ## line 10 # CUMULATIVE AREA = 53.67 SQ MI mce_loop_f { my ( $mce, $chunk_ref, $chunk_id ) = @_; ## Skip initial record containing header lines including *** *** if ( $chunk_id == 1 && $$chunk_ref !~ /^${match_string}/ ) { ## Gathering here is necessary when preserving output order, ## to let the manager process know chunk_id 1 has completed. MCE->gather( $chunk_id, "" ); MCE->next; } ## Each record begins with INTERPOLATED HYDROGRAPH. my ( $k1, $k2, $k3 ) = ( "", "", "" ); open my $ifh, "<", $chunk_ref; while ( <$ifh> ) { $k1 = $1 and next if $. == 1 && /(\S+)\s*$/; $k2 = $1 and next if $. == 7 && /^\S+\s+(\S+)/; $k3 = $1 and last if $. == 9 && /(\S+)\s*$/; } close $ifh; ## Gather values. MCE->gather( $chunk_id, "$k1,$k2,$k3\n" ); } $input_file;

Thanks again. I hope to be learning more of this in the future.

  • Comment on Re^3: Perl solution for current batch file to extract specific column text
  • Download Code

Replies are listed 'Best First'.
Re^4: Perl solution for current batch file to extract specific column text
by marioroy (Prior) on Aug 07, 2015 at 16:49 UTC

    Thank you oryan for sharing the before and after results. That is really amazing. Sometimes, providing solutions based on the initial post may not be spot on. But, we tried nonetheless.

    Kind regards, Mario