comment on

Mikhailoh:

I do this nearly every day at work. I'm always chopping up files and creating new files in order to figure out something interesting from the data. Perl is a great tool for this.

You'll want to use the split function for delimited files; unpack or Parse-FixedLength, etc., as runrig suggested; and regular expressions (regexes) for parsing human-readable reports. Actually, I frequently use combinations of these techniques in the same file.

One suggestion: I've found it handy to write some simple domain-specific tools (such as a transaction dumper...) that accept some simple data format or can make some assumptions about the structure of the data. Then you can write quickie perl scripts to extract the appropriate data from your dataset to dump with your tools. I have lots of little chunks of code that do such tasks.

As an example, in my job, I frequently have to find merchant numbers in one file and match them with merchant numbers from another file. Our merchant numbers are always in 16-character fields, so I have a little program that accepts four parameters (Filename 1, MID column, Filename 2, MID column). It simply scans the first file and collects merchant numbers, then scans the second file and prints out hits. I'm not at work right now, but it goes something like this:

#!/usr/bin/perl -w
my $usage="
merch_match <FName1> <MID Col> <FName2> <MID Col>

Prints all Merchant IDs found in File 1 that are also
contained in File 2.
";

my %MIDS;
my $FName1 = shift or die $usage . "Missing FName1";
my $MIDcol1 = shift or die $usage . "Missing MID COL #1";
my $FName2 = shift or die $usage . "Missing FName1";
my $MIDcol2 = shift or die $usage . "Missing MID COL #2";

# substr uses 0-based string offsets
$MIDcol1--;
$MIDcol2--;

# Gather MID numbers from File 1
open(INF,'<',$FName1) or die $usage . "Can't open " . $FName1;
while (<INF>) {
    my $M = substr($_, $MIDcol1, 16);
    $MIDS{$M}=0;
}
close(INF);

# Gather MIDs from File 2
open(INF,'<',$FName2) or die $usage . "Can't open " . $FName2;
while (<INF>) {
    my $M = substr($_, $MIDcol2, 16);
    ++$MIDS{$M} if exists $MIDS{$M};
}
close(INF);

# Print match list
for my $M (sort keys %MIDS) {
    print $M if exists $MIDS{$M};
}
[download]

With this program, I can take nearly any of my data feeds and match the merchants with any other one, even though they contain the merchant number in different columns...

--roboticus

In reply to Re: New to PERL - file format conversions to do by roboticus
in thread New to PERL - file format conversions to do by Mikhailoh

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.