Split file based on field

calv1n has asked for the wisdom of the Perl Monks concerning the following question:

I got a file which has a header and some data lines

    Usr1369***12556
06-01-0101:00    1169 <snipped off>
06-01-0101:00    2396 <snipped off>
06-01-0103:12    1169 <snipped off>
06-01-0103:12    2569 <snipped off>
06-01-0301:00    1169 <snipped off>
06-01-0301:00    2396 <snipped off>
[download]

What needs to be done is to split the file into multiple files based upon the first field such as 06-01-0101:00 So the output file will be

    Usr1369***12556
06-01-0101:00    1169 <snipped off>
06-01-0101:00    2396 <snipped off>

    Usr1369***12556
06-01-0103:12    1169 <snipped off>
06-01-0103:12    2569 <snipped off>

    Usr1369***12556
06-01-0301:00    1169 <snipped off>
06-01-0301:00    2396 <snipped off>
[download]

I can get the header and one line by using

my @filedata = <IFILE_HANDLER>;
my $head = shift(@filedata);
my $line

foreach $line(@filedata);
[download]

Then I get the outputfile handler and do

print (OFILE_HANDLER "$head");
print (OFILE_HANDLER  "$line");
close (OFILE_HANDLER);
[download]

I haven't the faintest idea of how to group it such that I get one file for each combination of ^\d{2}-\d{2}-\d{4}:\d{2} Would appreciate any help you guys could give.

Comment on Split file based on field Select or Download Code

Replies are listed 'Best First'.
Re: Split file based on field by holli (Abbot) on Jan 10, 2006 at 09:46 UTC
This is wether short nor elegant, but works: use strict; use warnings; my $header = <DATA>; my %handles; while ( <DATA> ) { my $handle; my $key = substr($_,0,13); # check if file is already open if ( $handles{$key} ) { $handle = $handles{$key}; } else { #if not, open it and store the handle my $name = $key; $name =~ s/://; open $handle, ">$name.txt" or die $!; print $handle $header; $handles{$key} = $handle; } print $handle $_; } __DATA__ Usr1369**12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off> [download] holli, /regexed monk/*	[reply] [d/l]
Re^2: Split file based on field by blazar (Canon) on Jan 10, 2006 at 13:40 UTC
Although both the subject and the description of the problem may suggest that he's after something like the code you suggest, I don't think he really wants to split his input into several different files. I think it's more reasonable to assume he's after something along the lines of: `#!/usr/bin/perl use strict; use warnings; $\ = $, = "\n"; chomp(my $head=<>); my ($last,@buffer); while (<>) { chomp; my $time = (split)[0]; if ( $time eq ($last \|\|= $time) ) { push @buffer, $_; } else { print $head, @buffer; ($last,@buffer)=($time,$_); } } print $head, @buffer; __END__` [download] Note that this assumes the input is ordered, which seems reasonable too, if you read the description carefully. Of course I do not claim it to be the cleanest or smartest way to do it, either.	[reply] [d/l]
Re: Split file based on field by McDarren (Abbot) on Jan 10, 2006 at 10:25 UTC
You could use a HoL: (Disclaimer: probably not the most optimal solution :) #!/usr/bin/perl -w use strict; my %data; my $header; while (<DATA>) { chomp; if (/Usr/) { $header = $_; next; } my ($timestamp, $rest) = $_ =~ /(\d{2}-\d{2}-\d{4}:\d{2})\s+(.)/; push @{ $data{$header}{$timestamp} }, $rest; } foreach $header (keys %data) { foreach my $timestamp (keys %{$data{$header}}) { print "$header\n"; print "$timestamp\t$_\n" for @{$data{$header}{$timestamp}}; } } __DATA__ Usr136912556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off> [download] Output: `Usr136912556 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> Usr136912556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> Usr1369*12556 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>` [download] Cheers, Darren :)	[reply] [d/l] [select]
Re: Split file based on field by sh1tn (Priest) on Jan 10, 2006 at 10:09 UTC
`my $temp; my $header = <DATA>; while( <DATA> ){ my ($id, $data) = /(\S+)(.+)/; open my $fh, '>>', $id or die "cannot open file $id: $!"; print $fh $header if $temp ne $id; print $fh $id, $data, $/; $temp = $id; } __DATA__ Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>` [download]	[reply] [d/l]
Re^2: Split file based on field by holli (Abbot) on Jan 10, 2006 at 11:10 UTC
Your solution has some flaws. It breaks under windows, because you try to use a ":" in a filename assumes the input is sorted and breaks if the is input is not sorted. opens/closes the file(s) for every! line of input data holli, /regexed monk/	[reply] [d/l]
Re^3: Split file based on field by sh1tn (Priest) on Jan 10, 2006 at 12:43 UTC
# breaks under windows, because you try to use a ":" in a filename ":" can be changed assumes the input is sorted and breaks if the is input is not sorted. yes, it is sorted opens/closes the file(s) for every! line of input data yes, I can see	[reply]
Re: Split file based on field by erniep (Sexton) on Jan 10, 2006 at 12:48 UTC
if you need to keep the header associated with the data records then I would create a temp file which appends each data record with the header record. EX: 06-01-0301:00 1169 Usr1369***12556 Take the temp file and sort on the data record prtion of the record. read sorted record in to an array looking for a break on the data (if prior ne current){write array to a file}. There are a million other ways to do this and I'm sure many are more efficient. Hope this helps.	[reply]