comment on

So, my first post and experience here with Perl has prompted me to find better solutions to what I do using Perl. First post: http://perlmonks.org/index.pl?node_id=1137275.

I am trying to modify this solution for another project of mine. I can get all the individual parts to work, but they won’t come together. The idea is I have a large file (again) and am trying to extract text (again) from a single line that starts with a certain delimiter (the previous solution pulled text from different lines). I can get each column to extract to my csv individually, but I can’t get them to do it all at the same time. Here is what I have, modified from the previous solution

use strict;
use warnings;

use MCE::Loop;
use MCE::Candy;

## Input and output files defined here as well as what we are searchin
+g for in the input file
my $input_file  = shift || 'InputFile.OH1';
my $output_file = shift || 'OutputFile.csv';
my $match_string = "+                          ";

open my $ofh, ">", $output_file
   or die "cannot open '$output_file' for writing: $!\n";

## This writes a header row that GIS sees as the column heading
print $ofh "HEC1_ID,Q100_Base,TTP,Area\n";
   
MCE::Loop::init {
   use_slurpio => 1, chunk_size => 1, max_workers => 4,
   gather => MCE::Candy::out_iter_fh($ofh),
   RS => "\n${match_string}",
};

## Below, each worker receives one record at a time
## Output order is preserved via MCE::Candy::out_iter_fh

## EXAMPLE INPUT FILE
Line  1##+                           BPI30      1319.   13.50         
+477.        147.         49.       4.64
Line  2##
Line  3##          ROUTED TO
Line  4##+                           RPI30      1220.   13.75         
+475.        147.         49.       4.64
Line  5##
Line  6##          HYDROGRAPH AT
Line  7##+                           BPI31       765.   12.42         
+102.         26.          9.        .73
Line  8##
Line  9##          2 COMBINED AT
Line 10##+                           CPI31      1242.   13.75         
+571.        172.         58.       5.37

mce_loop_f {
   my ( $mce, $chunk_ref, $chunk_id ) = @_;

   ## Skip initial record containing header lines including *** ***
   if ( $chunk_id == 1 && $$chunk_ref !~ /^${match_string}/ ) {
      ## Gathering here is necessary when preserving output order,
      ## to let the manager process know chunk_id 1 has completed.
      MCE->gather( $chunk_id, "" );
      MCE->next;
   }

   ## Each record begins with "+                          "
   my ( $k1, $k2, $k3, $k4 ) = ( "", "", "", "" );
   open my $ifh, "<", $chunk_ref;
   while ( <$ifh> ) {
      $k1 = $1 and next if $. == 1 && /^\S\s+(\S+)/;
      $k2 = $1 and next if $. == 1 && /^\S\s+\S+\s+(\S+)/;
      $k3 = $1 and next if $. == 1 && /^\S\s+\S+\s+\S+\s+(\S+)/;
      $k4 = $1 and last if $. == 1 && /(\S+)\s*$/;
   }
   close $ifh;

## Gather values.
## This outputs everything to the output file in the format below.
   MCE->gather( $chunk_id, "$k1,$k2,$k3,$k4\n" );
} $input_file;
[download]

As an example from Line 1 I am trying to extract BPI30, 1319, 13.50, 4.64, and then RPI30, 1220, 13.75, 4.64 from Line 4, etc. The lines always begin with “+ (spaces)“. Each one of the $k1-4 will extract the correct data into the right column, but it won’t put it together. Any help is appreciated, and I hope it is a silly oversight on my part. Thanks.

In reply to Extract string to file by oryan

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.