in reply to Removing Delimiters

My input:
060,071905,001206656,80,199,23,612,129,3,99,005168, 060,071905,001366155,80,199,41,611,750,3,99,100471, 060,071905,003326935,80,199,11,612,113,3,23,018042,
I used this to define where the delimiters were needed:
elsif ($type eq "060"){ $parsDat[0] = substr($input, 0, 3); #Record type $parsDat[1] = substr($input, 3, 6); #Dist Id $parsDat[2] = substr($input, 9, 9); #id $parsDat[3] = substr($input, 18, 2); #fname $parsDat[4] = substr($input, 20, 3); #mname $parsDat[5] = substr($input, 23, 2); #lname $parsDat[6] = substr($input, 25, 3); #code $parsDat[7] = substr($input, 29, 3); #code $parsDat[8] = substr($input, 32, 1); #code $parsDat[9] = substr($input, 33, 2); #code $parsDat[10] = substr($input, 35, 6); #code $parsDat[11] = substr($input, 41, 39); #code
I used this to place the delimiters:
$output = join(",", @parsDat); print $handle "$output\n";
As far as the input is concerned, I have about 16 different files to append to each into one.
Thanks

edited by ybiC: balanced <code> tags, s/tab/spacespacespace/

Replies are listed 'Best First'.
Re: Re: Removing Delimiters
by graff (Chancellor) on Aug 15, 2003 at 04:27 UTC
    Even though you seem to have trouble expressing yourself, I think I get the context: you start with some set of data (one or more files?) that is structured in a weird, ugly, unfriendly fixed-width format; the data set contains some errors, and people could correct these errors if the data were reformatted to be more readable in a meaningful way. But then, after the errors have been fixed, you need to put the corrected information back into the ugly, unfriendly fixed-width format.

    Well, that would be fine, so long as two conditions are met (one of which was mentioned in one of the other main replies on this thread):

    • The data fields themselves do not contain commas (this apparently is not a problem)
    • When corrections were made to the human-readable data, the field values did not become narrower, or even more important they did not become wider.

    In other words, reverting back to the original format is easy, so long as each of the field values still has exactly the same number of characters as before.

    If that condition is met, then you don't need anything more than a regex substitution or replacment -- either of the following will do (let's assume that $line contains the comma-delimited, human-readable/spreadsheet-portable form):

    $line =~ s/,//g; # or, another way to get the same result: $line =~ tr/,//d;
    Now, if the corrected data happens to end up with wider values relative to the original data, then your script has to either reject the data with a warning, or die with a message, saying the data cannot be converted back because of a too-wide field value (and it should be specific: which input file, which line, which field, what its value is, how wide it's supposed to be).

    If the data has fewer characters than the original, you'd have to figure out whether you can pad these values, and if so, what is the proper way to do the padding (leading zeros? leading spaces? something else?) Or maybe you should just reject these as well.

    This is all based on a guess about your task, but depending on what happens to the data in its "parsed" form, this is something you need to be very clear and careful about.

    A good way to incorporate field-width checks is to have an array that can drive both the initial parsing from the original file(s), and the width checking when "re-joining" the parsed data back into those ugly strings; eg, based on your code snippet:

    @widths = (3,6,9,2,3,2,3,3,1,2,6,39); # how to break up the original fixed-width data record: while (<OLD>) { my @out = (); for my $w (@widths) { push @out, substr( $_, 0, $w ); #pull off a record $_ = substr( $_, $w ); # remove it from $_ } print NEW join( ",", @out ), "\n"; } close OLD; close NEW; # in a different script now, but with the same values for @widths: # how to check field widths if/when NEW file has been "corrected" open( NEW, ... ) # read the formatted data open( FIXED, ">...) # re-write the old format with new content while (<NEW>) { chomp; my $out = ""; my @val = split /,/; die "bad field count at line $.\n" unless (@val == @widths); for my $i (0..$#widths) { die "bad data at line $., column $i: $val[$i] (not $widths[$i] +characters)\n" unless ( length( $val[$i] ) == $widths[$i] ); $out .= $val[$i]; } print FIXED "$out\n"; }
Re: Re: Removing Delimiters
by esh (Pilgrim) on Aug 14, 2003 at 22:58 UTC

    It seems to me that you have provided a sample of your input for the second process and a segment of code which was used to generate that from the first process.

    Please provide a sample of your desired output from the second process.

    Note: For the sample code you provided, instead of a bunch of substr calls, you could probably use a single call to unpack like:

    @parseDat = unpack("a3a6a9a2a3a2a3a3a1a2a6a39", $input);

    -- Eric Hammond