tariqahsan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

What would be a good way to read an fixed length data file
and based on the first column value generate separate output files?

Sample input file would be like -

T001 Test1 012354 Abcde
T001 Test1 013456 bcdef
T002 Test2 024567 xxxxx
T001 Test1 012354 yyyyy
T003 Test3 02345 cdefg
T002 Test2 000000 56789

Expecting to get 3 files T001.out, T002.out and T003.out for the above input file
containing their respective first column value.

Thanks

  • Comment on create separate output files based on the matched values

Replies are listed 'Best First'.
Re: create separate output files based on the matched values
by ikegami (Patriarch) on Sep 27, 2005 at 21:31 UTC

    unpack is great for fixed width fields.

    # @fields = unpack('a5 a6 a7 a5', $_); # @fields = unpack('a4 x1 a5 x1 a6 x1 a5', $_); open(my $fh_in, '<', ...) or die("Can't open input file: $!\n"); while (<$fh_in>) { chomp; my ($file_name, $rest) = unpack('a4 x1 a*', $_); $file_name .= '.out'; open(my $fh_out, '>>', $file_name) or die("Can't open $file_name for append: $!\n"); print $fh_out ("$rest\n"); }

    By the way, your data is not fixed width. The 5th record is shorter.

Re: create separate output files based on the matched values
by GrandFather (Saint) on Sep 27, 2005 at 21:35 UTC

    The following does the trick. Note that the line ending for the created files is Windows rather than Mac or *nix style.

    use warnings; use strict; my %files; while (<DATA>) { my ($name, $data) = /^(\w+)\s+(.*)/; last if ! defined $data || ! length $data; open $files{$name}, '>', "$name.out" if (! defined $files{$name}); syswrite $files{$name}, $data . "\r\n"; } close $files{$_} for (keys %files); __DATA__ T001 Test1 012354 Abcde T001 Test1 013456 bcdef T002 Test2 024567 xxxxx T001 Test1 012354 yyyyy T003 Test3 02345 cdefg T002 Test2 000000 56789

    Perl is Huffman encoded by design.
      Are you sure you will get CR LF at the line end on every system? read perldoc perlport:
             In most operating systems, lines in files are terminated by newlines.
             Just what is used as a newline may vary from OS to OS.  Unix tradition-
             ally uses "\012", one type of DOSish I/O uses "\015\012", and Mac OS
             uses "\015".
      
             Perl uses "\n" to represent the "logical" newline, where what is logi-
             cal may depend on the platform in use.  In MacPerl, "\n" always means
             "\015".  In DOSish perls, "\n" usually means "\012", but when accessing
             a file in "text" mode, STDIO translates it to (or from) "\015\012",
             depending on whether you're reading or writing.  Unix does the same
             thing on ttys in canonical mode.  "\015\012" is commonly referred to as
             CRLF.
      
      

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print

        I don't think that the line endings are a particular problem for the OP. The trick is generating the various output files.

        However replacing the syswrite with:

        my $fh = $files{$name}; print $fh "$data\n";

        fixes the problem. What do you expect from a reply written before morning coffee :).


        Perl is Huffman encoded by design.
      What if I want to put a header line for each of the
      generated files?

      What's the best way to do this using this script?

      Thanks for the help!

        Change:

        open $files{$name}, '>', "$name.out" if (! defined $files{$name});

        to

        if (! defined $files{$name}) { open $files{$name}, '>', "$name.out"; syswrite $files{$name}, "This is a header line for file $name.out\ +r\n"; }

        Perl is Huffman encoded by design.
Re: create separate output files based on the matched values
by izut (Chaplain) on Sep 27, 2005 at 21:43 UTC

    If I got your specs correctly, and your data is separated by spaces, you can split the line and use a hash to store the opened filehandles. This code should work:

    Update: Updated split - Thanks Skeeve.

    open my $fh_input, "<", "input.txt" or die "$!"; my %fh = (); while (<$fh_input>) { chomp; my ($filename, $content) = split /\s+/, $_, 2; my $fh = undef; $fh = $fh{$filename} if defined $fh{$filename}; if (not defined $fh{$filename}) { open $fh, ">", "$filename.out" or die "$!"; $fh{$filename} = $fh; } else { $fh = $fh{$filename}; } print $fh $content, "\n"; } foreach (keys %fh) { close $fh{$_}; }


    Igor S. Lopes - izut
    surrender to perl. your code, your rules.
      Haven't looked at all of your code, but it fails here:
      my ($filename, $content) = split /\s+/;
      You will loose everything but the first 2 columns. You should have used:
      my ($filename, $content) = split /\s+/,$_,2;

      $\=~s;s*.*;q^|D9JYJ^^qq^\//\\\///^;ex;print