in reply to Extract string to file

The serial solution by aitap seems fast enough. I was curious and wanted to see if running faster is possible against a 200 MB input file. The serial version completes in 4.566 seconds and parallel in 1.440 seconds, which is beyond 3x faster.

Perl regular expressions involves CPU time and the reason for running faster. IO is sequential in MCE, not parallel or random, to minimize unnecessary delays; e.g. seek time.

Perl is fast at these things :)

use warnings; use strict; use MCE::Loop; use MCE::Candy; ## Select input and output files my $input_file = 'InputFile.OH1'; my $output_file = 'OutputFile.csv'; open my $ofh, ">", $output_file or die "cannot open '$output_file' for writing: $!\n"; ## Creates a header at the beginning of the file my $header = "HEC1_ID,Q100_Base,TTP,Area\n"; print $ofh $header; ## Extracts data from input in parallel. MCE::Loop::init { use_slurpio => 1, chunk_size => '20k', max_workers => 4, gather => MCE::Candy::out_iter_fh($ofh), }; mce_loop_f { my ( $output, $mce, $chunk_ref, $chunk_id ) = ( '', @_ ); open my $ifh, "<", $chunk_ref; while (<$ifh>) { s{ ^ # at the beginning of the line \+ # followed by literal plus (?:\s+(\S+)) # column one (?:\s+(\S+)) # column two (?:\s+(\S+)) # column three (?:\s+\S+){3} # skip three columns \s+(\S+)$ # catch the last one, too }{$1,$2,$3,$4}x and do { $output .= $_ }; } close $ifh; MCE->gather( $chunk_id, $output ); } $input_file; close $ofh;

Kind regards, Mario.