calv1n has asked for the wisdom of the Perl Monks concerning the following question:

I got a file which has a header and some data lines
Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>
What needs to be done is to split the file into multiple files based upon the first field such as 06-01-0101:00 So the output file will be
Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> Usr1369***12556 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> Usr1369***12556 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>
I can get the header and one line by using
my @filedata = <IFILE_HANDLER>; my $head = shift(@filedata); my $line foreach $line(@filedata);
Then I get the outputfile handler and do
print (OFILE_HANDLER "$head"); print (OFILE_HANDLER "$line"); close (OFILE_HANDLER);
I haven't the faintest idea of how to group it such that I get one file for each combination of ^\d{2}-\d{2}-\d{4}:\d{2} Would appreciate any help you guys could give.

Replies are listed 'Best First'.
Re: Split file based on field
by holli (Abbot) on Jan 10, 2006 at 09:46 UTC
    This is wether short nor elegant, but works:
    use strict; use warnings; my $header = <DATA>; my %handles; while ( <DATA> ) { my $handle; my $key = substr($_,0,13); # check if file is already open if ( $handles{$key} ) { $handle = $handles{$key}; } else { #if not, open it and store the handle my $name = $key; $name =~ s/://; open $handle, ">$name.txt" or die $!; print $handle $header; $handles{$key} = $handle; } print $handle $_; } __DATA__ Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>


    holli, /regexed monk/

      Although both the subject and the description of the problem may suggest that he's after something like the code you suggest, I don't think he really wants to split his input into several different files. I think it's more reasonable to assume he's after something along the lines of:

      #!/usr/bin/perl use strict; use warnings; $\ = $, = "\n"; chomp(my $head=<>); my ($last,@buffer); while (<>) { chomp; my $time = (split)[0]; if ( $time eq ($last ||= $time) ) { push @buffer, $_; } else { print $head, @buffer; ($last,@buffer)=($time,$_); } } print $head, @buffer; __END__

      Note that this assumes the input is ordered, which seems reasonable too, if you read the description carefully. Of course I do not claim it to be the cleanest or smartest way to do it, either.

Re: Split file based on field
by McDarren (Abbot) on Jan 10, 2006 at 10:25 UTC
    You could use a HoL:

    (Disclaimer: probably not the most optimal solution :)

    #!/usr/bin/perl -w use strict; my %data; my $header; while (<DATA>) { chomp; if (/Usr/) { $header = $_; next; } my ($timestamp, $rest) = $_ =~ /(\d{2}-\d{2}-\d{4}:\d{2})\s+(.*)/; push @{ $data{$header}{$timestamp} }, $rest; } foreach $header (keys %data) { foreach my $timestamp (keys %{$data{$header}}) { print "$header\n"; print "$timestamp\t$_\n" for @{$data{$header}{$timestamp}}; } } __DATA__ Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>

    Output:

    Usr1369***12556 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> Usr1369***12556 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>

    Cheers,
    Darren :)

Re: Split file based on field
by sh1tn (Priest) on Jan 10, 2006 at 10:09 UTC
    my $temp; my $header = <DATA>; while( <DATA> ){ my ($id, $data) = /(\S+)(.+)/; open my $fh, '>>', $id or die "cannot open file $id: $!"; print $fh $header if $temp ne $id; print $fh $id, $data, $/; $temp = $id; } __DATA__ Usr1369***12556 06-01-0101:00 1169 <snipped off> 06-01-0101:00 2396 <snipped off> 06-01-0103:12 1169 <snipped off> 06-01-0103:12 2569 <snipped off> 06-01-0301:00 1169 <snipped off> 06-01-0301:00 2396 <snipped off>


      Your solution has some flaws. It
      • breaks under windows, because you try to use a ":" in a filename
      • assumes the input is sorted and breaks if the is input is not sorted.
      • opens/closes the file(s) for every! line of input data


      holli, /regexed monk/
            # breaks under windows, because you try to use a ":" in a filename
      • ":" can be changed
            assumes the input is sorted and breaks if the is input is not sorted.
      • yes, it is sorted
            opens/closes the file(s) for every! line of input data
      • yes, I can see
Re: Split file based on field
by erniep (Sexton) on Jan 10, 2006 at 12:48 UTC
    if you need to keep the header associated with the data records then I would create a temp file which appends each data record with the header record. EX: 06-01-0301:00 1169 Usr1369***12556 Take the temp file and sort on the data record prtion of the record. read sorted record in to an array looking for a break on the data (if prior ne current){write array to a file}. There are a million other ways to do this and I'm sure many are more efficient. Hope this helps.