Grygonos has asked for the wisdom of the Perl Monks concerning the following question:

Monks, I have an EBCDIC file that needs the last byte of each line chopped. The bytes remaining after the chop are then written to a new filehandle. When comparing the output of my script and the original file I find some rather strange happenings. The file is mixed up, not in a "the words are reading right to left" kinda way... but sections of the file (which are perfectly formatted in their own area.. i.e. all data still lines up within an individual section..fixed width file btw) are just in the wrong order.. below is my code.. any help is GREATLY appreciated.. I assume perl's doing something here that is cooler than I am, becuase I don't get it.
#!/Perl/bin/perl use strict; use warnings; my $line; open (ORIGINAL,"<1.txt") or die "$!"; open (NEWFILE,">new_1.txt") or die "$!"; while(!eof ORIGINAL) { sysread ORIGINAL,$line,5201,0; chop $line; print NEWFILE $line; } close(NEWFILE); close(ORIGINAL);
I also tried syswrite rather than print, but it yielded the same results.

Replies are listed 'Best First'.
Re: EBCDIC File I/O
by MonkE (Hermit) on May 17, 2006 at 13:30 UTC
    According to the sysread documentation, mixed use of the unbuffered sysread() function with buffered I/O functions such as eof() will cause wierdness. Instead try checking the return value of sysread() to see if it is zero.
      That did it.. 100% the problem.. thanks. In the past I had used read to accomplish the same task.. never realized eof was buffered. Incredibly interesting behavior though. Thanks much.
Re: EBCDIC File I/O
by bass_warrior (Beadle) on May 17, 2006 at 18:47 UTC
    Why mess with a read or sysread?
    while(<ORIGINAL>) { chop $_; print NEWFILE $_; }
      Because it's a fixed-width file, with no line terminators. This will work however (as a filter):
      local $/ = \5201; while (<>) { chop; print; }
      'man perlvar' for details.
      He would not want to use a simple "while (<ORIGINAL>) {" because newline is different for EBCDIC. It's 0x15 instesd of 0x0A, so the lines don't come out right. (One could always set $/, the input record separator, but if the records are fixed length the choice between setting $/ and using a fixed length read would be a matter of clarity and/or taste. IMO.)