in reply to Storing state of execution
I've used Storable with great success. Another method I've taken to lately is using JSON, which stores in plain text, but is cross-language (I can write in Perl/Python/insert-language-here, then open it back up with any other one). You could also use Data::Dumper to store and retrieve state (Perl only).
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Storing state of execution
by SuicideJunkie (Vicar) on Dec 09, 2015 at 15:05 UTC | |
I would add that which you want depends a lot on the data you need to store. If the data is simple and modestly sized, then JSON (or YAML) would probably be best. If your data includes reference loops or binary data, or if the data structure is large (speed becomes an issue) then Storable would probably be best. Dumper is more of a middle ground, if you need it to be coder-readable and it is a complex data structure, but you can absolutely trust the source of the data when you read it back. I'd only recommend it as debug output, since reading it back in involves running arbitrary perl code. | [reply] |
by tye (Sage) on Dec 09, 2015 at 16:01 UTC | |
or if the data structure is large (speed becomes an issue) then Storable would probably be best. That doesn't match my memory. A quick test showed JSON::XS taking just over 1/3 of the time of Storable (and producing almost exactly the same number of bytes of output). Using JSON has other advantages. And I consider forcing one to stick to simple data to be one of them. - tye | [reply] |
Re^2: Storing state of execution
by afoken (Chancellor) on Dec 10, 2015 at 05:56 UTC | |
One big problem of Storable is that its exact file format depends on the perl version and on the machine perl was compiled for. Changing the processor architecture and/or the perl version begs for trouble. Data::Dumper generates executable perl code that has to be parsed back into the program using string eval. That works, sure, but it is a security nightmare: Imagine someone inserting system "rm -rf /" into the saved dump. Data::Dumper does not dump everything, sometimes, it just generates dummy code:
JSON, XML, and YAML don't have those problems. They simply don't allow code references, and they all are independant from the perl version and the processor architecture. XML can't store binary data, because some characters (0x00) are not allowed in XML, not even in escaped form. You have to resort to using a hex dump, base64 or quoted-printable encoding. XML stores some data multiple times (opening and closing tags contain the element name), wasting more disk space than other formats. JSON has data types (string, number, array, key-value pairs, booleans, and null alias undef). It lacks some higher data types, most commonly a date and time type. Usually, one uses strings or key-value pairs ("objects") for that, but you could also use a number (counting days or seconds since an epoch value). Reading back JSON with dates in strings or objects requires some knowledge about the data. You need to know if a string is a date in disguise or just a string. JSON does not define comments. Some JSON parsers allow comments. JSON::XS uses shell-style # comments, but that does not fit into a Javascript context (from which JSON is derived). Javascript has /* */ and // comments, that would make the most sense to use in JSON. YAML: I can't get it into my head. There are at least two or three ways to represent the same information, and some just don't make sense to me. I try to avoid YAML. Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |
by choroba (Cardinal) on Dec 10, 2015 at 09:04 UTC | |
Data::Dumper does not dump everything, sometimes, it just generates dummy codeUnless you specify
| [reply] [d/l] [select] |
by afoken (Chancellor) on Dec 11, 2015 at 20:16 UTC | |
But the deparsed code does not contain all required information in all cases:
Output:
Yes, this is constructed. But it shows that deparsing the sub reference is not sufficient to restore all state after a Data::Dumper-eval cycle. The state of $n is lost, creating two colliding IDs. It gets even worse without the state feature:
Output:
On the other hand, complaining loudly is better than just generating repeated IDs. Stupidly removing use strict and use warnings from the code hides the error, and results in worse behaviour:
Output:
Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |
by Discipulus (Canon) on Dec 10, 2015 at 08:44 UTC | |
have you experence with this also? L* PS What they say about their module is definetively intriguing! see Sereal Comparison Graphs L*
There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS. | [reply] [d/l] |
by afoken (Chancellor) on Dec 11, 2015 at 19:31 UTC | |
But is not worth to mention also Sereal ? Never used it. I stumbled over Sereal some time ago, then forgot it, because I did not need it. It looks quite promising, and has similarities to various other binary formats (like BSON, BJSON, MessagePack). All of those formats promise compact data storage and easy parsing. But you lose one big advantage of text-base file formats: You can not simply read them using less, your favorite web browser, or your favorite text editor. You need a converter and/or a special viewer. If storage size or data transfer volume is an issue, the text-based formats can usually be compressed quite well, resulting in sizes similar to binary formats. As usual, Wikipedia has a big list, containing both binary and text-based formats: https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] |
by Anonymous Monk on Dec 10, 2015 at 07:38 UTC | |
afoken: Data::Dumper generates executable perl code that has to be parsed back into the program using string eval. That works, sure, but it is a security nightmare: Imagine someone inserting system "rm -rf /" into the saved dump. Try Re: Perl data notation (PSON/JSON/SafestUndumper), Re: Safe.pm: Which parameter for permit_only? (Safest Undumper of Data::Dumper) | [reply] |
by afoken (Chancellor) on Dec 11, 2015 at 18:52 UTC | |
I prefer to have a storage format that by definition can not contain executable code instead of relying on a filter that tries to prevent malicious code execution inside a string eval. One bug in Safe and the "SafestUndumper" is no longer save, but instead happily executes malicious code. Also, the "non-executable" formats force the programmer to use a parser. There is no way to accidentally or intentionally use a string eval on those formats. So, who would intentionally use a string eval on untrusted code? A little bit of bean counting: Actually, every storage format that can contain strings can - in theory - also contain executable Perl code. But when reading back formats like XML or JSON, an explicit string eval on an extracted string is required, and that string eval is not present in the library reading the file format (or, at least, it should not be present). Oh, and string eval means more than just eval $string: And finally: Any Javascript compiler/interpreter must be able to read and execute JSON, as it is a very restricted subset of Javascript/ECMAScript. That also means that using Javascripts eval (always a string eval) to read JSON is a tempting, but stupid idea, on the same level as using Perl's string eval to read Data::Dumper output. Since ECMAScript Fifth Edition (2009), there is a special JSON parser embedded in the Javascript environment (see https://github.com/douglascrockford/JSON-js/blob/master/README). Alexander
-- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-) | [reply] [d/l] [select] |
by stevieb (Canon) on Dec 10, 2015 at 07:26 UTC | |
++ That's a spectacular explanation. | [reply] |