Re: Re: Removing Delimiters

Even though you seem to have trouble expressing yourself, I think I get the context: you start with some set of data (one or more files?) that is structured in a weird, ugly, unfriendly fixed-width format; the data set contains some errors, and people could correct these errors if the data were reformatted to be more readable in a meaningful way. But then, after the errors have been fixed, you need to put the corrected information back into the ugly, unfriendly fixed-width format.

Well, that would be fine, so long as two conditions are met (one of which was mentioned in one of the other main replies on this thread):

The data fields themselves do not contain commas (this apparently is not a problem)
When corrections were made to the human-readable data, the field values did not become narrower, or even more important they did not become wider.

In other words, reverting back to the original format is easy, so long as each of the field values still has exactly the same number of characters as before.

If that condition is met, then you don't need anything more than a regex substitution or replacment -- either of the following will do (let's assume that $line contains the comma-delimited, human-readable/spreadsheet-portable form):

$line =~ s/,//g;
# or, another way to get the same result:
$line =~ tr/,//d;
[download]

Now, if the corrected data happens to end up with wider values relative to the original data, then your script has to either reject the data with a warning, or die with a message, saying the data cannot be converted back because of a too-wide field value (and it should be specific: which input file, which line, which field, what its value is, how wide it's supposed to be).

If the data has fewer characters than the original, you'd have to figure out whether you can pad these values, and if so, what is the proper way to do the padding (leading zeros? leading spaces? something else?) Or maybe you should just reject these as well.

This is all based on a guess about your task, but depending on what happens to the data in its "parsed" form, this is something you need to be very clear and careful about.

A good way to incorporate field-width checks is to have an array that can drive both the initial parsing from the original file(s), and the width checking when "re-joining" the parsed data back into those ugly strings; eg, based on your code snippet:

@widths = (3,6,9,2,3,2,3,3,1,2,6,39);
# how to break up the original fixed-width data record:
while (<OLD>) {
   my @out = ();
   for my $w (@widths) {
      push @out, substr( $_, 0, $w );  #pull off a record
      $_ = substr( $_, $w );  # remove it from $_
   }
   print NEW join( ",", @out ), "\n";
}
close OLD;
close NEW;

# in a different script now, but with the same values for @widths:
# how to check field widths if/when NEW file has been "corrected"
open( NEW, ... ) # read the formatted data
open( FIXED, ">...) # re-write the old format with new content
while (<NEW>) {
    chomp;
    my $out = "";
    my @val = split /,/;
    die "bad field count at line $.\n" unless (@val == @widths);
    for my $i (0..$#widths) {
       die "bad data at line $., column $i: $val[$i] (not $widths[$i] 
+characters)\n"
           unless ( length( $val[$i] ) == $widths[$i] );
       $out .= $val[$i];
    }
    print FIXED "$out\n";
}
[download]

Comment on Re: Re: Removing Delimiters Select or Download Code