New to PERL - file format conversions to do

Mikhailoh has asked for the wisdom of the Perl Monks concerning the following question:

I'm an old Assembler and COBOL programmer. I have need to write some flat file conversion programs. I'll be reading one 7KB record at a time and parsing it into multiple records to be written, along with reformatting a few fields and generating a few more.

Just on what I have described here, do ou think perl is a good tool for the job?

Thanks.

Update

I need to write some conversion programs that: * Read a large record, about 7KB * Parse the data into multiple record formats and write them out * Generate record type fields 'a '1' record is patient info, a '2' is lab sample data, a '3' is diagnosis information.. at the end of each record is a 0 or a 1 indicating whether or not it is the last one of its type for this patient, so I have to keep track of where I am in the record arrays and whether or not I have a new patient in a record or a new test for the same one) * Reformat several data fields, usually stripping out some characters. My question to all of you would have to be is perl a good choice for this type of activity? It woudl be a perfect application for a COBOL program, but this facility has no COBOL compiler. Thanks

Material from duplicate node added as update and formating added by GrandFather

Comment on New to PERL - file format conversions to do

Replies are listed 'Best First'.
Re: New to PERL - file format conversions to do by runrig (Abbot) on Jun 07, 2006 at 21:52 UTC
Given that you're dealing with 7KB records, I'm guessing that there are quite a few fields packed in there. I would probably use Parse-FixedLength for the job, but then, I'm biased, since I (re)wrote the module :-) And, to be balanced, there are alternatives mentioned in the docs (and in future updates I may also mention Parse-Binary and Parse-Binary-Iterative), and I do ~~confess~~ mention that this and other similar modules are just glorified substitutes for the perl built-in functions pack and unpack. Update: Since you want to flag the last record for each patient/record-type (if I read that correctly), you will have to decide whether you want to keep records in memory (if there aren't too many records or distinct patient/record-types), or load the data into a database (even a lightweight database like DB_File or DBD::SQLite might be appropriate) and sort by patient/record-type/record-number(desc) to easily "flag" each "last" record. But know that (as already mentioned) perl is perfect for the job, and it would be best if you would start writing sample code and ask for help with specific problem areas (rather than me guessing at what you might have problems with, and which approach to take :-).	[reply]
Re: New to PERL - file format conversions to do by roboticus (Chancellor) on Jun 07, 2006 at 22:41 UTC
Mikhailoh: I do this nearly every day at work. I'm always chopping up files and creating new files in order to figure out something interesting from the data. Perl is a great tool for this. You'll want to use the split function for delimited files; unpack or Parse-FixedLength, etc., as runrig suggested; and regular expressions (regexes) for parsing human-readable reports. Actually, I frequently use combinations of these techniques in the same file. One suggestion: I've found it handy to write some simple domain-specific tools (such as a transaction dumper...) that accept some simple data format or can make some assumptions about the structure of the data. Then you can write quickie perl scripts to extract the appropriate data from your dataset to dump with your tools. I have lots of little chunks of code that do such tasks. As an example, in my job, I frequently have to find merchant numbers in one file and match them with merchant numbers from another file. Our merchant numbers are always in 16-character fields, so I have a little program that accepts four parameters (Filename 1, MID column, Filename 2, MID column). It simply scans the first file and collects merchant numbers, then scans the second file and prints out hits. I'm not at work right now, but it goes something like this: #!/usr/bin/perl -w my $usage=" merch_match <FName1> <MID Col> <FName2> <MID Col> Prints all Merchant IDs found in File 1 that are also contained in File 2. "; my %MIDS; my $FName1 = shift or die $usage . "Missing FName1"; my $MIDcol1 = shift or die $usage . "Missing MID COL #1"; my $FName2 = shift or die $usage . "Missing FName1"; my $MIDcol2 = shift or die $usage . "Missing MID COL #2"; # substr uses 0-based string offsets $MIDcol1--; $MIDcol2--; # Gather MID numbers from File 1 open(INF,'<',$FName1) or die $usage . "Can't open " . $FName1; while (<INF>) { my $M = substr($_, $MIDcol1, 16); $MIDS{$M}=0; } close(INF); # Gather MIDs from File 2 open(INF,'<',$FName2) or die $usage . "Can't open " . $FName2; while (<INF>) { my $M = substr($_, $MIDcol2, 16); ++$MIDS{$M} if exists $MIDS{$M}; } close(INF); # Print match list for my $M (sort keys %MIDS) { print $M if exists $MIDS{$M}; } [download] With this program, I can take nearly any of my data feeds and match the merchants with any other one, even though they contain the merchant number in different columns... --roboticus	[reply] [d/l]
Re: New to PERL - file format conversions to do by Joost (Canon) on Jun 07, 2006 at 21:34 UTC
It seems to me that perl is right for this kind of job, yes :-) Depending on what you want to do with the data, these manpages that come with the perl distribution might be useful: open, split, pack. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: (dup) Perl for flat file conversions? by Joost (Canon) on Jun 07, 2006 at 21:52 UTC
Side note: Please do not double post. You can edit your posts or add additional notes instead. This is a simple example to get you started: `$/ = \ 7 * 1024; # read 7 K records. while (<>) { # read record into $_ my $info = substr($_, 0, 1000); # put first 1000 bytes in $info my $lab = substr($_, 1000, 1000); my $diagnosis = substr($_, 20000, 49999); my $last = chop; # get last character # now do stuff with that data }` [download] "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply] [d/l]
Re^2: (dup) Perl for flat file conversions? by runrig (Abbot) on Jun 07, 2006 at 22:33 UTC
If there are a lot of fields in the record (which admittedly has not yet been established), substr is what I definitely would not use (yuck).	[reply]
Re: New to PERL - file format conversions to do by gellyfish (Monsignor) on Jun 07, 2006 at 21:31 UTC
Yes, Perl would be perfect for the job. /J\	[reply]
Re: New to PERL - file format conversions to do by dsheroh (Monsignor) on Jun 07, 2006 at 21:36 UTC
A minor cultural note... Many people will give you a hard time for writing "PERL". "Perl" or "perl" is much safer. As for your actual question, I agree that Perl sounds like it should work pretty well for you.	[reply]
Re: (dup) Perl for flat file conversions? by girarde (Hermit) on Jun 07, 2006 at 21:42 UTC
Perl would do this job very handily.	[reply]
Re: (dup) Perl for flat file conversions? by OfficeLinebacker (Chaplain) on Jun 07, 2006 at 21:41 UTC
Mikhailoh, welcome! Though I admit that database has a structure far more complex than anything I have dealt with in Perl, I think it would be a fine language to use, assuming your system has enough memory (even then, you can just process one or a group of lines at a time). _________________________________________________________________________________ I like computer programming because it's like Legos for the mind.	[reply]
Re: New to PERL - file format conversions to do by kwaping (Priest) on Jun 07, 2006 at 23:13 UTC
To simply answer your direct question, yes, Perl is an excellent tool for that kind of job. You could say Perl was "born to do it". --- It's all fine and dandy until someone has to look at the code.	[reply]