in reply to DATA munging data

The data seems to show up correctly in vi, but a tail of the file discovers the same data perl gives when reading <DATA>.

And the data that two out of three tools finds is what exactly? I don't think anyone has taken a guess at what is going wrong because you didn't give us enough information to go on.

I suggest you use a tool that dumps all bytes such as "cat -v" or "od" (and tell us what you find if you still can't figure it out).

And yes, if you use binmode, Perl can read and write all possible byte values, even to its own scripts (it won't parse many byte values in many places in a script, but after the __END__ or __DATA__ tag, arbitrary bytes should not be a problem).

Also, although Perl has been gaining more and more abilities to deal with Unicode characters, I'm not aware of any operating systems where Perl would be reading or writing Unicode characters unless you went out of your way to tell Perl to do that. But you also didn't tell us what operating system this was on, so I can't say whether it is one I know anything about or not.

        - tye (but my friends call me "Tye")

Replies are listed 'Best First'.
Re: (tye)Re: DATA munging data
by jynx (Priest) on May 10, 2001 at 23:24 UTC

    D'oh! i did forget that the OS would be important. However, i did say that a sample string of numbers (of which i would get characters) is:
       12 34 79 54 2

    Maybe i should have said that these characters are:
       ^L " O 6 ^B
    (that is, <ctrl>-L, double quote, capital "oh", the digit 6, and <ctrl>-B)

    For the record, i'm using an i686 Linux Red Hat 6.1 box. The output code could look something like:

    print map {chr} (12,34,79,54,75,8,2);
    And the input code like:
    @array = <DATA>;
    In these examples, the problem exists, and binmode has not been used.

    After further testing it seems that if the character for backspace comes up in the sequence, than it deletes the previous character in the string before getting into the array, which explains some of my results. i'll be doing further testing on reading character by character to see if it resolves that issue.

    Sorry for the lack of information, i think i'm kind of known for it... :-{

    ,xnaht
    jynx

      Yes, you said what you were trying to write which is the same as what "vi" saw. But you did not say what Perl and "tail" read back and how it was different.

      If you are expecting just plain "tail" to display binary data such as chr(8), then you are mistaken. If you are checking what Perl reads back via a simple     print $string; then you'll have the same problem.

      You are writing the data with something like:     print map {chr} @array; which can be rewritten as:     print pack "C*", @array; then you should be extracting the data after you read it with something like:     @array= unpack "C*", $string; You said you are using:     @array = <DATA>; which will split the data into "lines" based on the value of $/, so this probably isn't working too well. So you either need to set $/ (probably to undef) or use someting like read (or perhaps sysread).

      If you have a recent version of Perl, then another alternative is to set $/ to \1 (a reference to the scalar value 1) to tell Perl to read in fixed-length records of 1 byte each, but then you'll still have to unpack the value out of those 1-byte strings so I'd just do this:     @array= unpack "C*", do { local($/); <DATA> };

              - tye (but my friends call me "Tye")