in reply to New to PERL - file format conversions to do

Mikhailoh:

I do this nearly every day at work. I'm always chopping up files and creating new files in order to figure out something interesting from the data. Perl is a great tool for this.

You'll want to use the split function for delimited files; unpack or Parse-FixedLength, etc., as runrig suggested; and regular expressions (regexes) for parsing human-readable reports. Actually, I frequently use combinations of these techniques in the same file.

One suggestion: I've found it handy to write some simple domain-specific tools (such as a transaction dumper...) that accept some simple data format or can make some assumptions about the structure of the data. Then you can write quickie perl scripts to extract the appropriate data from your dataset to dump with your tools. I have lots of little chunks of code that do such tasks.

As an example, in my job, I frequently have to find merchant numbers in one file and match them with merchant numbers from another file. Our merchant numbers are always in 16-character fields, so I have a little program that accepts four parameters (Filename 1, MID column, Filename 2, MID column). It simply scans the first file and collects merchant numbers, then scans the second file and prints out hits. I'm not at work right now, but it goes something like this:

#!/usr/bin/perl -w my $usage=" merch_match <FName1> <MID Col> <FName2> <MID Col> Prints all Merchant IDs found in File 1 that are also contained in File 2. "; my %MIDS; my $FName1 = shift or die $usage . "Missing FName1"; my $MIDcol1 = shift or die $usage . "Missing MID COL #1"; my $FName2 = shift or die $usage . "Missing FName1"; my $MIDcol2 = shift or die $usage . "Missing MID COL #2"; # substr uses 0-based string offsets $MIDcol1--; $MIDcol2--; # Gather MID numbers from File 1 open(INF,'<',$FName1) or die $usage . "Can't open " . $FName1; while (<INF>) { my $M = substr($_, $MIDcol1, 16); $MIDS{$M}=0; } close(INF); # Gather MIDs from File 2 open(INF,'<',$FName2) or die $usage . "Can't open " . $FName2; while (<INF>) { my $M = substr($_, $MIDcol2, 16); ++$MIDS{$M} if exists $MIDS{$M}; } close(INF); # Print match list for my $M (sort keys %MIDS) { print $M if exists $MIDS{$M}; }
With this program, I can take nearly any of my data feeds and match the merchants with any other one, even though they contain the merchant number in different columns...

--roboticus