Hi,
I'm a bit rusty at this so I thought I seek advice from a higher source =). I've got a script I put together and its aim is to read in a data file and create multiple files based on the same first 6 characters (in my case I'm processing NMEA data. i.e.
$GPVTG,156.08,T,,M,0.08,N,0.15,K,D*3E
$GPGGA,181908.20,3809.22198,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*73
$GPVTG,156.13,T,,M,0.05,N,0.09,K,D*34
$GPGGA,181908.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7C
$GPVTG,284.88,T,,M,0.06,N,0.11,K,D*30
$GPGGA,181908.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.6,0138*7D
$GPVTG,1.72,T,,M,0.01,N,0.02,K,D*3F
$GPGGA,181908.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,6.8,0138*7D
$GPVTG,175.67,T,,M,0.06,N,0.11,K,D*3C
$GPGGA,181909.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.0,0138*7D
$GPVTG,357.02,T,,M,0.11,N,0.21,K,D*38
$GPZDA,181909.00,24,07,2008,00,00*65
$GPGGA,181909.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.2,0138*7D
$GPVTG,25.22,T,,M,0.06,N,0.11,K,D*09
$GPGGA,181909.40,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,7.4,0138*7D
$GPVTG,157.60,T,,M,0.06,N,0.12,K,D*38
$GPGGA,181909.60,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.6,0138*79
$GPVTG,49.76,T,,M,0.09,N,0.17,K,D*0B
$GPGGA,181909.80,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,3.8,0138*79
$GPVTG,304.77,T,,M,0.08,N,0.15,K,D*33
$GPGGA,181910.00,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.0,0138*76
$GPVTG,168.33,T,,M,0.08,N,0.15,K,D*3B
$GPGGA,181910.20,3809.22197,N,09726.10823,W,2,10,0.9,453.7,M,-27.1,M,4.2,0138*76
$GPVTG,202.08,T,,M,0.16,N,0.29,K,D*3C
The above example would make 3 files (one each for $GPGGA, $GPVTG and $GPZDA.
I want the files producted to be of the format:
<orig_filename_prefix>_<first 6 chars>.txt
so file mydata.txt might split to
mydata_$GPGGA.txt and mydata_$GPGGA
The script WORKS for <first 6 chars>.txt format but when I add the filename prefix it all goes to hell and gobbles up my main file.
I'd appreciate some hints on what obvious n00b mistake I've made this time.
Thanks Paul =)
p.s. Heres the script
#!/usr/bin/perl
my $file = shift;
bad_format() if ($file eq "" );
open FILE, $file or die "Could not open file [$file]\n";
print "file : $file\n";
#Uncomment this section and things go goofy
#$file =~ m/(\w+)\..*/;
#my $fname = $1;
###
my %files = (); # Hash with file prefix and handle
my $line;
while ($line = <FILE>) {
$line =~ m/^(.{6}).*/;
my $ffc = $1;
if ($ffc ne "" ) {
my $check = 0;
foreach my $key(%files) {
$check = 1 if ($ffc eq $key);
}
if ($check == 0) {
print "Adding new handle : $ffc\n";
local *FH;
open (FH, ">$ffc.txt") or die;
#open (FH, ">$fname_$ffc.txt") or die; # I want to save
+ the file as this format
$files{$ffc} = *FH;
}
my $f = $files{$ffc};
print $f $line;
#print "writing to $key\n";
}
}
while (my ($key, $value) = each (%files)) {
print "Closing $key\n";
close $value;
}
close FILE;
sub bad_format {
print "\nformat: split <file>\n\n";
exit;
}
UPDATE:
Thanks all for your comments. There were a number of mistakes I'd made and some nice alternate methods for doing things I hadn't seen. I hadn't worked with hashes of file handles before and used an older PM search to integrate the method I had, but Ikegami's direct assignment of the handle into the hash is much more elegant. Thanks for the help, here is the final script I ended up with.
#!/usr/bin/perl
# Splits a data file into unique files based on each lines first 6 cha
+racters
use warnings;
use strict;
my $file = shift;
bad_format() if ($file eq "" );
open FILE, $file or die "Could not open file [$file]\n";
my ($fname) = $file =~ m/(\w+)\..*/;
my %files = ();
while (my $line = <FILE>) {
if ($line !~ /^\s*$/) {
my $fc = substr($line, 0, 6); # first characters
if (!exists $files{$fc}) {
open ($files{$fc}, ">$fname\_$fc.txt") or die;
}
print {$files{$fc}} $line;
}
}
while (my ($key, $value) = each (%files)) {
print "Created $fname\_$key.txt\n";
close $value;
}
close FILE;