DATA munging data

jynx has asked for the wisdom of the Perl Monks concerning the following question:

Hello again,

It is my hope that this isn't a simple FAQ, but i can't seem to find anything in Q&A and the Camel didn't seem to have what i was looking for either. My basic problem involves the fact that encoded data after __END__ isn't being read correctly by the DATA filehandle.

i'm encoding lists of numbers into the character for each value, then printing it into __END__ as a caching technique to retrieve the next time the program is run (so long as the same parameters are passed in, which can be checked). i have done test after test to make sure the data is being written correctly, and can only surmise that it is. The data seems to show up correctly in vi, but a tail of the file discovers the same data perl gives when reading <DATA>.

So lets say i have a string like:
12 34 79 54 2
i can write the characters associated with those values by appending to $0, or at least so it seems.

So the questions at hand are:
1) Is the information being written correctly?
2) Is the information being read correctly?

Is it possible for perl to write all characters (low and high ordinal values) to output files correctly? It is my guess that the characters are being written ASCII or Unicode and then read the opposite (which would account for it), but i'm not certain and don't know where to look for finding out.

Thanks in advance,
jynx

Comment on DATA munging data Select or Download Code

Replies are listed 'Best First'.
(tye)Re: DATA munging data by tye (Sage) on May 10, 2001 at 08:59 UTC
The data seems to show up correctly in vi, but a tail of the file discovers the same data perl gives when reading `<DATA>`. And the data that two out of three tools finds is what exactly? I don't think anyone has taken a guess at what is going wrong because you didn't give us enough information to go on. I suggest you use a tool that dumps all bytes such as "cat -v" or "od" (and tell us what you find if you still can't figure it out). And yes, if you use binmode, Perl can read and write all possible byte values, even to its own scripts (it won't parse many byte values in many places in a script, but after the __END__ or __DATA__ tag, arbitrary bytes should not be a problem). Also, although Perl has been gaining more and more abilities to deal with Unicode characters, I'm not aware of any operating systems where Perl would be reading or writing Unicode characters unless you went out of your way to tell Perl to do that. But you also didn't tell us what operating system this was on, so I can't say whether it is one I know anything about or not. - tye (but my friends call me "Tye")	[reply] [d/l]
Re: (tye)Re: DATA munging data by jynx (Priest) on May 10, 2001 at 23:24 UTC
D'oh! i did forget that the OS would be important. However, i did say that a sample string of numbers (of which i would get characters) is: 12 34 79 54 2 Maybe i should have said that these characters are: ^L " O 6 ^B (that is, <ctrl>-L, double quote, capital "oh", the digit 6, and <ctrl>-B) For the record, i'm using an i686 Linux Red Hat 6.1 box. The output code could look something like: `print map {chr} (12,34,79,54,75,8,2);` [download] And the input code like: `@array = <DATA>;` [download] In these examples, the problem exists, and binmode has not been used. After further testing it seems that if the character for backspace comes up in the sequence, than it deletes the previous character in the string before getting into the array, which explains some of my results. i'll be doing further testing on reading character by character to see if it resolves that issue. Sorry for the lack of information, i think i'm kind of known for it... :-{ ,xnaht jynx	[reply] [d/l] [select]
(tye)Re2: DATA munging data by tye (Sage) on May 11, 2001 at 01:50 UTC
Yes, you said what you were trying to write which is the same as what "vi" saw. But you did not say what Perl and "tail" read back and how it was different. If you are expecting just plain "tail" to display binary data such as chr(8), then you are mistaken. If you are checking what Perl reads back via a simple `print $string;` then you'll have the same problem. You are writing the data with something like: `print map {chr} @array;` which can be rewritten as: `print pack "C", @array;` then you should be extracting the data after you read it with something like: `@array= unpack "C", $string;` You said you are using: `@array = <DATA>;` which will split the data into "lines" based on the value of $/, so this probably isn't working too well. So you either need to set $/ (probably to undef) or use someting like read (or perhaps sysread). If you have a recent version of Perl, then another alternative is to set $/ to \1 (a reference to the scalar value 1) to tell Perl to read in fixed-length records of 1 byte each, but then you'll still have to unpack the value out of those 1-byte strings so I'd just do this: `@array= unpack "C*", do { local($/); <DATA> };` - tye (but my friends call me "Tye")	[reply] [d/l] [select]
Re: DATA munging data by premchai21 (Curate) on May 10, 2001 at 07:20 UTC
You might want to try some persistence modules, e.g. Data::Dumper, Data::Denter, and/or Storable. These will allow you to serialize a perl data structure to a string, which can be written to a file, and read it back in later and deserialize it back to the structure. You can then use an external file, which is safer as you can't accidentally overwrite pieces of your script. Also, the modules will allow for if you must store more complex structures later, whereas simply flat-filing it will not. Update: Fixed as per chipmunk's reply.	[reply]
Re: Re: DATA munging data by chipmunk (Parson) on May 10, 2001 at 07:25 UTC
I tend to agree that saving the data to a separate file is a better solution. However, it's not true that __END__ doesn't work with <DATA>. Either __END__ or __DATA__ may be used to signify the end of the code and the beginning of data which may be read from the special DATA filehandle. perldata: `The tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual end of file. Any following text is ignored, but may be read via a DATA filehandle: main::DATA for __END__, or PACKNAME::DATA (where PACKNAME is the current package) for __DATA__.` [download] And an example: `#!perl while (<DATA>) { print; } __END__ Hello, world!` [download] produces: `Hello, world!`	[reply] [d/l] [select]
Re: DATA munging data by perlmonkey (Hermit) on May 10, 2001 at 07:23 UTC
Well there is some info on __DATA__ and __END__ in perldata and more in SelfLoader which might help. I dont think perl does any conversions unless you are on windows in which case you might need to use binmode. Without code, I dont have any other ideas though.	[reply]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks