I do appreciate the help on this issue, it has been wonderful. I have been intrigued by Perl and what it can do. It is definitely quite different from anything that I have used before. I did have to dive into the solutions that were provided a little, because it was not working right out of the gate for me, and it came down to leading spaces in a line of the input in front of "Interpolated Hydrograph". For some reason I left that off on the original post.
I modified that, added a header row, and removed the \r on the MCE->gather so there are no blank lines. The time difference is night and day:
Batch (~275k lines) = ~3 hours
Perl <1 second
The final code that works like a charm that I am using:
use strict; use warnings; use MCE::Loop; use MCE::Candy; my $input_file = shift || 'input.txt'; my $output_file = shift || 'output.txt'; my $match_string = " INTERPOLATED HYDROGRAPH A +T "; open my $ofh, ">", $output_file or die "cannot open '$output_file' for writing: $!\n"; print $ofh "HEC1_ID,Q100,V100\n"; MCE::Loop::init { use_slurpio => 1, chunk_size => 1, max_workers => 4, gather => MCE::Candy::out_iter_fh($ofh), RS => "\n${match_string}", }; ## Below, each worker receives one record at a time ## Output order is preserved via MCE::Candy::out_iter_fh ## line 1 CAC40 # INTERPOLATED HYDROGRAPH AT CAC40 ## line 2 # blank line here ## line 3 # PEAK FLOW TIME MAXIMUM AVERAGE FLOW ## line 4 # 6-HR 24-HR 72-HR 166.58-HR ## line 5 # + (CFS) (HR) ## line 6 # (CFS) ## line 7 1223. # + 1223. 12.67 890. 588. 245. 106. ## line 8 # (INCHES) .154 .408 .509 .509 ## line 9 1456. # (AC-FT) 441. 1166. 1456. 1456. ## line 10 # CUMULATIVE AREA = 53.67 SQ MI mce_loop_f { my ( $mce, $chunk_ref, $chunk_id ) = @_; ## Skip initial record containing header lines including *** *** if ( $chunk_id == 1 && $$chunk_ref !~ /^${match_string}/ ) { ## Gathering here is necessary when preserving output order, ## to let the manager process know chunk_id 1 has completed. MCE->gather( $chunk_id, "" ); MCE->next; } ## Each record begins with INTERPOLATED HYDROGRAPH. my ( $k1, $k2, $k3 ) = ( "", "", "" ); open my $ifh, "<", $chunk_ref; while ( <$ifh> ) { $k1 = $1 and next if $. == 1 && /(\S+)\s*$/; $k2 = $1 and next if $. == 7 && /^\S+\s+(\S+)/; $k3 = $1 and last if $. == 9 && /(\S+)\s*$/; } close $ifh; ## Gather values. MCE->gather( $chunk_id, "$k1,$k2,$k3\n" ); } $input_file;
Thanks again. I hope to be learning more of this in the future.
In reply to Re^3: Perl solution for current batch file to extract specific column text
by oryan
in thread Perl solution for current batch file to extract specific column text
by oryan
For: | Use: | ||
& | & | ||
< | < | ||
> | > | ||
[ | [ | ||
] | ] |