in reply to string gets front truncated

Try this:
#!/usr/bin/perl use strict; use warnings; die "Usage: $0 input.file\n" unless ( @ARGV == 1 and -f $ARGV[0] ); my $input_file = shift; my $input_byte_count = -s $input_file; { local $/; # set input_record_separator to undef -- switch to "slu +rp" mode open( I, $input_file ) or die "$input_file: $!"; my $whole_file = <I>; close I; } if ( length( $whole_file ) < $input_byte_count ) { die "$0: Perl can't read all of $input_file"; } elsif ( length( $whole_file ) > $input_byte_count ) { die "$0: Either -s $input_file is lying, or bytes were added durin +g read\n"; } else { warn "$0: got exactly $input_byte_count bytes from $input_file\n"; } # now, what was it you need to do with $whole_file? ... # to normalize all whitespace to " " (making it just one long line): s/\s+/ /g; print;
Note that after removing all the line breaks, you might have trouble seeing the whole thing in any sort of shell window. If the input is valid html data, the output should be perfectly viewable in a web browser.

Replies are listed 'Best First'.
Re^2: string gets front truncated
by hsfrey (Beadle) on Jul 30, 2008 at 04:20 UTC
    Thanks! I tried that and got the same problem, though it seemed pre-truncated at a different place.
      Um, if you "tried that", I assume you mean that you ran the code as I posted it. I know that what I posted compiles without errors or warnings; I don't have your data to test it on, but when I save the script as "j.pl" and run it on any file I have like this:
      j.pl some.file
      I consistently get a message like this printed to STDERR:
      j.pl: got exactly nnn bytes from some.file
      where "nnn" turns out to be the actual size of the data file provided as a command-line arg.

      So what sort of message did it print to STDERR when you ran it? If the message was "got exactly nnnnn bytes from your.file", and your.file happens to have nnnnn bytes, then the read was successful, and you are simply having trouble viewing all the data -- that is, the problem is not in the perl script, but instead would be in your display tool, and in how that tool handles this data stream.

      It could perhaps be something in the data file itself that is causing your display tool (terminal window? browser? something else?) to behave in some unexpected way -- e.g. some unexpected control byte is causing it to overwrite or otherwise erase/obliterate part of the data that is being given to it for display.

      Consider looking for other ways to inspect the data so you can see what is going on. Redirect the perl script output to a file, edit that file with some trustworthy editor (emacs, vi, or somesuch), view it with some sort of hex-dump tool, etc.

      If you ran the script as I posted it, then there should be no line breaks in the output -- just spaces. For fun, you could try changing that last line before the print statement; instead of this:

      s/\s+/ /g;
      do this:
      s/\s+/\n/g;
      to put each non-space token on a separate line. If you have something like the gnu "less" (unix "more") for paging through a long file in a terminal window, that should convince you that the perl script is not losing any of the data.