comment on

Hi, I'm a bit rusty at this so I thought I seek advice from a higher source =). I've got a script I put together and its aim is to read in a data file and create multiple files based on the same first 6 characters (in my case I'm processing NMEA data. i.e.

$GPVTG,156.08,T,,M,0.08,N,0.15,K,D*3E $GPGGA,181908.20,3809.22198,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*73 $GPVTG,156.13,T,,M,0.05,N,0.09,K,D*34 $GPGGA,181908.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7C $GPVTG,284.88,T,,M,0.06,N,0.11,K,D*30 $GPGGA,181908.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.6,0138*7D $GPVTG,1.72,T,,M,0.01,N,0.02,K,D*3F $GPGGA,181908.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.8,0138*7D $GPVTG,175.67,T,,M,0.06,N,0.11,K,D*3C $GPGGA,181909.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.0,0138*7D $GPVTG,357.02,T,,M,0.11,N,0.21,K,D*38 $GPZDA,181909.00,24,07,2008,00,00*65 $GPGGA,181909.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*7D $GPVTG,25.22,T,,M,0.06,N,0.11,K,D*09 $GPGGA,181909.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7D $GPVTG,157.60,T,,M,0.06,N,0.12,K,D*38 $GPGGA,181909.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.6,0138*79 $GPVTG,49.76,T,,M,0.09,N,0.17,K,D*0B $GPGGA,181909.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.8,0138*79 $GPVTG,304.77,T,,M,0.08,N,0.15,K,D*33 $GPGGA,181910.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.0,0138*76 $GPVTG,168.33,T,,M,0.08,N,0.15,K,D*3B $GPGGA,181910.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.2,0138*76 $GPVTG,202.08,T,,M,0.16,N,0.29,K,D*3C

The above example would make 3 files (one each for $GPGGA, $GPVTG and $GPZDA.

I want the files producted to be of the format: <orig_filename_prefix>_<first 6 chars>.txt

so file mydata.txt might split to mydata_$GPGGA.txt and mydata_$GPGGA

The script WORKS for <first 6 chars>.txt format but when I add the filename prefix it all goes to hell and gobbles up my main file.

I'd appreciate some hints on what obvious n00b mistake I've made this time.

Thanks Paul =) p.s. Heres the script

#!/usr/bin/perl

my $file = shift;

bad_format() if ($file eq "" );

open FILE, $file or die "Could not open file [$file]\n";
print "file : $file\n";

#Uncomment this section and things go goofy
#$file =~ m/(\w+)\..*/;
#my $fname = $1;
###

my %files = ();    #  Hash with file prefix and handle

my $line;
while ($line = <FILE>) {
    $line =~ m/^(.{6}).*/;
    my $ffc = $1;

    if ($ffc ne "" ) {
        my $check = 0;
        foreach my $key(%files) {
            $check = 1 if ($ffc eq $key);
        }
        if ($check == 0) {
            print "Adding new handle : $ffc\n";
            local *FH; 
            open (FH, ">$ffc.txt") or die;
            #open (FH, ">$fname_$ffc.txt") or die;    # I want to save
+ the file as this format
            
            $files{$ffc} = *FH;
        }
        

        my $f = $files{$ffc};
        print $f $line;    
        #print "writing to $key\n";

    }
}

while (my ($key, $value) = each (%files)) {
    print "Closing     $key\n";
    close $value;
}

close FILE;

sub bad_format {
        print "\nformat: split <file>\n\n";
        exit;
}
[download]

UPDATE:

Thanks all for your comments. There were a number of mistakes I'd made and some nice alternate methods for doing things I hadn't seen. I hadn't worked with hashes of file handles before and used an older PM search to integrate the method I had, but Ikegami's direct assignment of the handle into the hash is much more elegant. Thanks for the help, here is the final script I ended up with.

#!/usr/bin/perl

# Splits a data file into unique files based on each lines first 6 cha
+racters
use warnings;
use strict;

my $file = shift;

bad_format() if ($file eq "" );

open FILE, $file or die "Could not open file [$file]\n";

my ($fname) = $file =~ m/(\w+)\..*/;
my %files = ();

while (my $line = <FILE>) {
    if ($line !~ /^\s*$/) {
        my $fc = substr($line, 0, 6); # first characters
        if (!exists $files{$fc}) {    
            open ($files{$fc}, ">$fname\_$fc.txt") or die;    
        }
        print {$files{$fc}} $line;            
    } 
}

while (my ($key, $value) = each (%files)) {
    print "Created     $fname\_$key.txt\n";
    close $value;
}
close FILE;
[download]

In reply to RegExp eating my $1 - FIXED! by thekestrel

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.