brkstr has asked for the wisdom of the Perl Monks concerning the following question:

I am in need of direction. I have been given a project to take a ascii file of varied length and update or repair information within. I have found documentation that show's me how to read the file (and number the lines), but I haven't found how to take the info read and place into alternate files to update or repair (front-end wise). finally I have to take all the repaired files and aggrigate them into the original file. Help is appreciated... Thanks

Replies are listed 'Best First'.
Re: ascii manipulation in perl
by sauoq (Abbot) on Aug 12, 2003 at 21:27 UTC
      As much as I want to learn perl, I am in need or making sure that i can do what is needed. This is for work and unfortunately they are placing a tight deadline on doing this. I have read some documentation on perl and got a idea that this language can work, I just need to be shown which way to look.

        I'm sorry, but that is like asking an auto dealer if a car can drive from one coast to another, before you take driving lessons...

        Of course Perl can do the job, so can C, and a host of other languages. If you want help solving the problem that give us better definition of what the problem is.

        A sample of the different type of records would be nice.


        Peter @ Berghold . Net

        Sieze the cow! Bite the day!

        Nobody expects the Perl inquisition!

        Test the code? We don't need to test no stinkin' code!
        All code posted here is as is where is unless otherwise stated.

        Brewer of Belgian style Ales

Re: ascii manipulation in perl
by allolex (Curate) on Aug 13, 2003 at 08:02 UTC

    I'm sure you'll be able to do what you need with Perl. Go out and get davorg's book Data Munging with Perl; it describes what you need to know to do this sort of thing. You can pick up the book on-line as a PDF for USD 18.50 from the publisher, but the printed copy is nice to have.

    You will still need to think about how the records in your text file can be isolated from one another. For us to help you in this forum, we'd need to know whether the information after the three-digit record type is also significant, i.e. whether it also needs to be divided up into fields. From what you have told us already, you could use a function called unpack to do this:

    #!/usr/bin/perl use strict; use warnings; my $template = 'A3A*'; # for unpack. it says you have some # ASCII data of a fixed length of 3, # and some more ASCII data until the # end of the record (record means 'li +ne' here) while (<DATA>) { my ($recordtype,$datastring) = unpack($template,$_); print "Record type: $recordtype\n$datastring\n\n" unless $recordtype + eq ''; } __DATA__ 090071905090405611071905001029842000281253P0223P0504011 80017 090071905090405611071905001029842000291253P0223P0504011 80007 03007190519912660000739900000026500 03007190519912660000839900000011500 040071905453901800xxxxxx x xxxxxxxx M4000 040071905453901806xxxxxx xxxxxxxx F4131 05007190545393434503187100 05007190545393963405187100 06007190545337000380199516129001399002650 06007190545337182980240356129999399013784 OUTPUT: Record type: 090 071905090405611071905001029842000281253P0223P0504011 80017 Record type: 090 071905090405611071905001029842000291253P0223P0504011 80007 Record type: 030 07190519912660000739900000026500 Record type: 030 07190519912660000839900000011500 Record type: 040 071905453901800xxxxxx x xxxxxxxx M4000 Record type: 040 071905453901806xxxxxx xxxxxxxx F4131 Record type: 050 07190545393434503187100 Record type: 050 07190545393963405187100 Record type: 060 07190545337000380199516129001399002650 Record type: 060 07190545337182980240356129999399013784

    --
    Allolex

Re: ascii manipulation in perl
by Zaxo (Archbishop) on Aug 12, 2003 at 21:52 UTC

    Ok, from your clariification it sounds like you have a text database whose format was never really decided upon. I guess that a number of different ad hoc formats got appended.

    Your best approach will probably be to try to characterize the different formats with regular expressions. Fixed length formats can be decoded with unpack, and delimited fields with split. It will take a certain amount of eyeball inspection to know you have detected all the formats. The trouble with ad hoc formats is that they usually are inadequate.

    Once you can detect and decode the whole file, use one of the CSV modules to write to a single, capable, standard.

    After Compline,
    Zaxo

Re: ascii manipulation in perl
by CountZero (Bishop) on Aug 12, 2003 at 21:39 UTC

    On a practical level, perhaps you can show us a relevant part of the input file and the way this has to be repaired, so we can at least have some understanding of what you are asking.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Yes, do something like "this is the data to begin with and this is the what I want the data to look like when I'm done".

      --
      Allolex

Re: ascii manipulation in perl
by brkstr (Novice) on Aug 12, 2003 at 22:33 UTC
    I meant to post this on the main thread.

    I grabbed a few lines of the file:

    090071905090405611071905001029842000281253P0223P0504011 80017
    090071905090405611071905001029842000291253P0223P0504011 80007
    03007190519912660000739900000026500
    03007190519912660000839900000011500
    040071905453901800xxxxxx x xxxxxxxx M4000
    040071905453901806xxxxxx xxxxxxxx F4131
    05007190545393434503187100
    05007190545393963405187100
    06007190545337000380199516129001399002650
    06007190545337182980240356129999399013784

    There are many more record types. By record types, I mean by the first three characters of each line. This is why I had meant to place into seperate file for manipulation.

    thanks
      You really should spend a few minutes reviewing some of the Tutorials. I know your on a tight timeline but it will most definatley take longer if you don't know what your doing and no degree of advice will suffice. So unless someone is willing just to do the work for you, I'd suggest reading the following:

      The Basics
      Basic Input and Output
      File Input and Output
      String matching and Regular Expressions

      I know this seems like alot to go through, but I'm sure you can get through it in less than an hour.

      Then when you have enough understanding you can ask more specific questions that I'm certain the Monks here will be happy to help you with.

      edited by ybiC: working links from textual URLs

A reply falls below the community's threshold of quality. You may see it by logging in.