azool has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to split a single large file into several slightly less large files, based on the first word on each line. It works fine, for the most part. I would like to restore the original field separator, but it's not working. What am I doing wrong?
#!/usr/bin/perl while ($line = <>){ @record = split(/\|/, $line); $fields = @record - 1; if ( open(OUTFILE, ">>$record[0]") ) { $" = "\|"; print @record[1..$fields]; } else { die("cannot open $record[0]"); } }

Replies are listed 'Best First'.
Re: $" seems not to be working
by jdporter (Paladin) on Mar 26, 2003 at 02:36 UTC
    Paladin is right - you need to set $, , not $" .

    Now I'd like to suggest a few changes (improvements?) to your code. I mean, TIMTOWTDI - but this is how I might write it...
    use IO::File; use strict; $, = "|"; my %fh; # key=filename; val=filehandle while (<>) { my @record = split /\|/; my $word1 = shift @record; $fh{$word1} ||= IO::File->new("> $word1") or die "Error opening $word1 for writing: $!\n"; $fh{$word1}->print( @record ); }
    What's different?
    1. use strict;
    2. declare all variables with my, to pass use strict;
    3. Setting $, outside the loop. (Might even want to local it.)
    4. Since you're using the first field for one purpose, and printing the remaining fields, shift it off the array. Then you can print the whole array (what's left).
    5. Only open each output file once, and cache the filehandle in an array. To do this, need to use IO::File handles.
    6. include $! in the open() error message.
    7. use $_ as the loop iterator / input line. That's what it's for, and makes the code less cluttered.
    Now for one further optimization: split the line directly to the desired variables:
    while (<>) { my( $word1, @record ) = split /\|/;
    And to get really too clever:
    ( $fh{$word1} ||= IO::File->new("> $word1") or die "Error opening $word1 for writing: $!\n" )->print( @record );

    Not meant as a criticism on you code; just offered as a different perspective.

    jdporter
    The 6th Rule of Perl Club is -- There is no Rule #6.

Re: $" seems not to be working
by Paladin (Vicar) on Mar 26, 2003 at 01:48 UTC
    $" is the record separator for arrays interpolated into double quoted strings. $, is the separator for print. perlvar for more info.
      To make it absolutely clear: you have to set $, to this effect, for the syntax you've used here, or you set $" and put the array slice in quotes. BTW you'd better use local on those variables, as they are globals, and they might influence the rest of your script, even the working of modules.
      { local $, = "|"; print @record[1..$fields]; }
      or
      { local $" = "|"; print "@record[1..$fields]"; }
      The default value of $" is a space, of $, it's the empty string — or undef, same thing here (no warnings).

      Note that the subscript of an array or a hash, or a ditto slice, is still evaluated as Perl code, at the time of interpolation. Thus, you can do calculations etc. there. Like:

      @number = qw(zero one two three four five six seven eight nine ten); print "The sum of $number[1] and $number[2] is $number[1+2].\n";

      The module Interpolation makes use of this little fact.

Re: $" seems not to be working
by BrowserUk (Patriarch) on Mar 26, 2003 at 03:21 UTC

    Small point, but worth knowing, you could replace

    $fields = @record -1;

    with $fields = $#record;

    That said, it seems wasteful to break the whole line into its constituent fields, just so that you can access the first field, especially as you are going to re-join all the other bits back together in order to write them back out. You can avoid the problem you are having, and both simplify and speed up your code by only breaking the line into the two parts of interest something like this.

    (I've assumed the omitted OUTFILE on the print statement was a transcription error.)

    #!/usr/bin/perl while ($line = <>){ my($first, $rest) = $line =~ m[(^.*?)\|(.*$)]; if ( open(OUTFILE, ">>$first") ) { print OUTFILE $rest; } else { die("cannot open $first"); } }

    One further possibility is that as coded, your script re-opens the output file for every line. Whilst I am not sure how costly (or not) re-opening a file that you haven't closed is, you might be better to avoid it it by building a hash to hold the file handles.

    #! perl -slw use strict; my %fhs; while(<>) { my ($first, $rest) = m[(^.*?)\|(.*$)]; open $fhs{$first}, '>>', $first or die "Couldn't open $first:$!" unless $fhs{$first}; print { $fhs{$first} } $rest; }

    The only slightly unusual thing with the above code is the need to wrap curlies {} around the filehandle to tell the compiler that you want to use the hash element referenced as a filehandle and aren't trying to print to STDOUT but forgot the comma.


    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
      <nitpick>

      Might as well make the code as generic as possible ...

      #! perl -slw use strict; my %fhs; my $d = '|'; while(<>) { my ($first, $rest) = m[(^.*?)\Q${d}\E(.*$)]; open $fhs{$first}, '>>', $first or die "Couldn't open $first:$!" unless $fhs{$first}; print { $fhs{$first} } $rest; }
      Of course, if you didn't need those curlies, the code would look even better ...
      #! perl -slw use strict; my %fhs; my $d = '|'; while(<>) { my ($first, $rest) = m[(^.*?)\Q${d}\E(.*$)]; my $fh = $fhs{$first} ||= do { IO::File->new('>>', $first) || die "Couldn't open $first:$!" }; $fh->print($rest); }
      </nitpick>

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: $" seems not to be working
by bbfu (Curate) on Mar 26, 2003 at 03:25 UTC

    If you use the 3rd arg to split, you shouldn't need to worry about recombining the line. Also, your code doesn't print anything to OUTFILE, it always prints to STDOUT.

    #!/usr/bin/perl while ($line = <>){ my ($file, $record) = split /\|/, $line, 2; if ( open(OUTFILE, ">>$file") ) { print OUTFILE $record; } else { die("cannot open $record[0]"); } }

    Some stylistic tips: You can use the -n switch to perl to get an implicit while(<>){} loop around your code, and you can use the short-cut nature of or to simplify your conditional:

    #!/usr/bin/perl -n my ($file, $rec) = split /\|/, $_, 2; # Good idea, jdporter and BrowserUk unless(exists $fh{$file}) { open $fh{$file}, ">> $file" or die "Can't open $file for append: $!\ +n"; } $fh{$file}->print $rec;

    Or, as a one-liner:

    perl -ne '($f,$r) = split /\|/, $_, 2; exists $fh{$f} or open $fh, ">> $f" or die "Can't open $f for append: $!\n"; print $fh{$f} $r;'

    Update: I liked jdporter and BrowserUk's idea about caching the filehandles, so I added it. :)

    bbfu
    Black flowers blossom
    Fearless on my breath