brkstr has asked for the wisdom of the Perl Monks concerning the following question:

Hello all

I will try to make this a better post than last.

I figured out how to parse off a ascii file to seperate files with all my fields delimited where I need them by using the 'join' function (with a lot of help I'll add). Then I came accross the next issues. I need to append all the files together again and remove the delimiters.

My initial reaction was to say use the same script, but instead of using the 'join' I should use something like the 'splice' or 'pop' function. But that only takes care of the delimiters right? Not the appending.

Does this theory have merit or am I on the wrong road?

thank you
Scott

Replies are listed 'Best First'.
Re: Removing Delimiters
by revdiablo (Prior) on Aug 14, 2003 at 23:44 UTC

    The opposite of a join is generally a split. So if you say $output = join ',', @parsDat;, then the opposite would be @parsDat = split ',', $output; (of course, any commas in the values of the original @parsDat will get clobbered and become delimiters themselves, though this won't be a problem with the sample data you posted)

    In this case, I must agree with thor's post, however. Since your original script already does all the work of slicing and dicing the data into a useable structure, why not take advantage of that, and build the new requirements directly into that?

Re: Removing Delimiters
by esh (Pilgrim) on Aug 14, 2003 at 22:15 UTC

    Please provide specific examples of the input file contents and desired output file content.

    BTW, you can append mutiple files on *nix using cat(1). For example,

    cat file1 file2 file3 > file.out
    If you just want to remove all instances of a specific delimiter character from the file (assuming no escapes or quoting), you could use something like the following to do it in place. This example assumes that your deliminter is a comma (,). Replace that with your character you want to remove.
    perl -pi.bak 's/,//g' file.out

    -- Eric Hammond

Re: Removing Delimiters
by Aristotle (Chancellor) on Aug 14, 2003 at 22:09 UTC
    The only bit of specification we can go by here is
    I need to append all the files together again and remove the delimiters.
    With some code posted it may have been sufficient, but with neither code nor a more detailed explanation, we can't even begin to guess what you're trying to do. Please be more specific.

    Makeshifts last the longest.

Re: Removing Delimiters
by thor (Priest) on Aug 14, 2003 at 23:37 UTC
    Perhaps I'm reading too much in to it, but it seems that you want the same data two different ways. If this is the case, modify your original script to output two streams while it's already ripped apart. You'll save yourself time and a headache.

    thor

Re: Removing Delimiters
by brkstr (Novice) on Aug 14, 2003 at 22:29 UTC
    My input:
    060,071905,001206656,80,199,23,612,129,3,99,005168, 060,071905,001366155,80,199,41,611,750,3,99,100471, 060,071905,003326935,80,199,11,612,113,3,23,018042,
    I used this to define where the delimiters were needed:
    elsif ($type eq "060"){ $parsDat[0] = substr($input, 0, 3); #Record type $parsDat[1] = substr($input, 3, 6); #Dist Id $parsDat[2] = substr($input, 9, 9); #id $parsDat[3] = substr($input, 18, 2); #fname $parsDat[4] = substr($input, 20, 3); #mname $parsDat[5] = substr($input, 23, 2); #lname $parsDat[6] = substr($input, 25, 3); #code $parsDat[7] = substr($input, 29, 3); #code $parsDat[8] = substr($input, 32, 1); #code $parsDat[9] = substr($input, 33, 2); #code $parsDat[10] = substr($input, 35, 6); #code $parsDat[11] = substr($input, 41, 39); #code
    I used this to place the delimiters:
    $output = join(",", @parsDat); print $handle "$output\n";
    As far as the input is concerned, I have about 16 different files to append to each into one.
    Thanks

    edited by ybiC: balanced <code> tags, s/tab/spacespacespace/

      Even though you seem to have trouble expressing yourself, I think I get the context: you start with some set of data (one or more files?) that is structured in a weird, ugly, unfriendly fixed-width format; the data set contains some errors, and people could correct these errors if the data were reformatted to be more readable in a meaningful way. But then, after the errors have been fixed, you need to put the corrected information back into the ugly, unfriendly fixed-width format.

      Well, that would be fine, so long as two conditions are met (one of which was mentioned in one of the other main replies on this thread):

      • The data fields themselves do not contain commas (this apparently is not a problem)
      • When corrections were made to the human-readable data, the field values did not become narrower, or even more important they did not become wider.

      In other words, reverting back to the original format is easy, so long as each of the field values still has exactly the same number of characters as before.

      If that condition is met, then you don't need anything more than a regex substitution or replacment -- either of the following will do (let's assume that $line contains the comma-delimited, human-readable/spreadsheet-portable form):

      $line =~ s/,//g; # or, another way to get the same result: $line =~ tr/,//d;
      Now, if the corrected data happens to end up with wider values relative to the original data, then your script has to either reject the data with a warning, or die with a message, saying the data cannot be converted back because of a too-wide field value (and it should be specific: which input file, which line, which field, what its value is, how wide it's supposed to be).

      If the data has fewer characters than the original, you'd have to figure out whether you can pad these values, and if so, what is the proper way to do the padding (leading zeros? leading spaces? something else?) Or maybe you should just reject these as well.

      This is all based on a guess about your task, but depending on what happens to the data in its "parsed" form, this is something you need to be very clear and careful about.

      A good way to incorporate field-width checks is to have an array that can drive both the initial parsing from the original file(s), and the width checking when "re-joining" the parsed data back into those ugly strings; eg, based on your code snippet:

      @widths = (3,6,9,2,3,2,3,3,1,2,6,39); # how to break up the original fixed-width data record: while (<OLD>) { my @out = (); for my $w (@widths) { push @out, substr( $_, 0, $w ); #pull off a record $_ = substr( $_, $w ); # remove it from $_ } print NEW join( ",", @out ), "\n"; } close OLD; close NEW; # in a different script now, but with the same values for @widths: # how to check field widths if/when NEW file has been "corrected" open( NEW, ... ) # read the formatted data open( FIXED, ">...) # re-write the old format with new content while (<NEW>) { chomp; my $out = ""; my @val = split /,/; die "bad field count at line $.\n" unless (@val == @widths); for my $i (0..$#widths) { die "bad data at line $., column $i: $val[$i] (not $widths[$i] +characters)\n" unless ( length( $val[$i] ) == $widths[$i] ); $out .= $val[$i]; } print FIXED "$out\n"; }

      It seems to me that you have provided a sample of your input for the second process and a segment of code which was used to generate that from the first process.

      Please provide a sample of your desired output from the second process.

      Note: For the sample code you provided, instead of a bunch of substr calls, you could probably use a single call to unpack like:

      @parseDat = unpack("a3a6a9a2a3a2a3a3a1a2a6a39", $input);

      -- Eric Hammond

Re: Removing Delimiters
by brkstr (Novice) on Aug 15, 2003 at 15:19 UTC
    Let me try to get this even better

    I have a process in mind for completing this project. I may be wrong in my interpretation of it (due to my expierence in this language. But none the less the process should be the same).

    I have input date like this:
    06007190546184070880199116119050311054500 06007190546184075080199236129111399017893 06007190546184107480199126119114399051273 06007190546184109380240356129999399011655 06007190546184115079199116119004322000440 06007190546184115079199136119004311000012

    I have 16 different record types (they differ by the first three characters of the data file). Then I use the same code as previously shown to parse and put into files to import into Access to change/modify the data:
    elsif ($type eq "060"){ $parsDat[0] = substr($input, 0, 3); #Record type $parsDat[1] = substr($input, 3, 6); #Dist Id $parsDat[2] = substr($input, 9, 9); #id $parsDat[3] = substr($input, 18, 2); #fname $parsDat[4] = substr($input, 20, 3); #mname $parsDat[5] = substr($input, 23, 2); #lname $parsDat[6] = substr($input, 25, 3); #code $parsDat[7] = substr($input, 29, 3); #code $parsDat[8] = substr($input, 32, 1); #code $parsDat[9] = substr($input, 33, 2); #code $parsDat[10] = substr($input, 35, 6); #code $parsDat[11] = substr($input, 41, 39); #code

    And...
    $output = join(",", @parsDat); print $handle "$output\n";

    The result I get is:
    060,071905,001206656,80,199,23,612,129,3,99,005168, 060,071905,001366155,80,199,41,611,750,3,99,100471, 060,071905,003326935,80,199,11,612,113,3,23,018042, 060,071905,004506295,79,199,11,611,999,3,23,001000, 060,071905,004506295,80,199,11,611,999,3,23,044175, 060,071905,007507563,80,199,11,611,128,3,11,052500, 060,071905,008364520,79,224,31,611,999,3,23,005000, 060,071905,008364520,80,224,31,611,999,3,23,041820,

    So far this part works fine. My next issues are to get these files into Access (for Modifing) and append all the files then remove the delimiters. Each field for all the records are fixed lengths and they should not change by the manipulation being done in Access.

    Thanks for the input, I hope this is a better post.

      You have indeed clearly explained the part of the process where you do not have a problem. Unfortunately, the part where you are asking for help is still unclear to me.

      Are you asking for help to "get these files into Access"? If so, you'll need to provide details on the Access schema so that somebody with Access knowledge can help.

      How can the data be modified in Access but "not changed by the manipulation in Access"?

      Is the data coming out of Access to be merged or are you just merging the original output files you already have?

      Have you tried to append all the files and remove the delimiters using my instructions in a previous post? Here's an even clearer sample:

      perl -e 's/,//g' inputfile1 inputfile2 >outputfile

      -- Eric Hammond

        As far as the Help getting the files into Access I have been manually converting them to .txt files and importing them into Access tables. If there is an easier way I haven't got there yet. As far as the "not changed...", I was trying to convey that the raw data has fixed field lengths and the changing of the data will be done to fit wihthin the field.

        I am currently trying the suggestions you gave me now, I just haven't put it all together yet.
        Although another ? comes to mind when I parse and make seperate files they come out like "tmp_060". Is there a way within the script to set a name for each?

        Thank you