EchoAngel has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks! I have this idea to create a script/package module to parse this file which looks like this for example. I parse these files a lot and I thought making them into a hash structure would make my life easier in terms of manipulating what data I want knowing which dimension depth.
library(ABC) { date : 04/10/04; revision : 1.5; conditions () { weather : bright ; temperature : 55 C; visible : heavy \ fog; } location(work); etc ... }
The thing is I don't know how many sub division of {} there are inside the file (ie more subdivision within conditions. So I was thinking of a recursive function to do this. In the end, I want a hash to be created like this Ie call Data:Dumper
library(ABC) -> date -> value -> 04/10/04 -> revision -> value -> 1.5 -> conditions -> weather -> value -> bright -> temperature -> value -> 55 -> visible -> value -> heavy fog -> location(work) -> value -> 1
I wrote some scrap code somewhere but I like to know what you guys think about my idea (good/bad) and where I could have some improvements.

Replies are listed 'Best First'.
Re: Recursive Function To Parse File For Generating Hash
by NetWallah (Canon) on Nov 05, 2004 at 05:58 UTC
    This looks so much like XML, that I'd recommend writing a simple line-by-line token parser, and converting it to XML, storing it into something like an XML::Simple object. Depending on the size of the file, the XML could stay in memory.

    Then you can use one of several XML modules, to extract, query and manipulate the data.

        Earth first! (We'll rob the other planets later)

Re: Recursive Function To Parse File For Generating Hash
by saintmike (Vicar) on Nov 05, 2004 at 08:16 UTC
    You could certainly write a parser using Parse::RecDescent, but coming up with your own format to describe your data is almost always a bad idea.

    Why? Because the devil's in the detail. What happens if a value contains a literal "{"? You'll have to escape it with something like "\". What if your data contains a literal "\", then? How do you break lines? How to add comments?

    These kinds of problems have been solved 1000 times already. A format like YAML should be expressive enough for your needs, so why not use something that's been proven to work?

      Not to deviate from the original topic, nor to disparage any of the points in your excellent response...

      but there are other alternatives beside using 'escape characters' as a method to distinguish delimiters from message content.

      YAML, for example, allows indentation.

      PERL, for example, allows user-definable alternate delimiters

      print q/"Hello-World"/; print q~"Hello/World~; print q!"Hello-/-World"!; print q§"Hello-/-World!"§;

      True, the devil is still in there, since you still have to make sure that your 'message-space' characters are always orthogonal to your 'delimiter-space' characters. Just making the point that 1) "escape characters" (as a typographical and programming convention) *suck*; and 2)This suckage is yet another of the problems already solved by perl, and YAML and Ruby (and others, if there are any).

Re: Recursive Function To Parse File For Generating Hash
by tachyon (Chancellor) on Nov 05, 2004 at 08:37 UTC

    I do not recommend this but you can possibly do it very simply. As noted it is already XMLish or even hashish. You can reformat it into a hash quite easily.....

    String eval is dangerous but if you own the files and its your box.....

    local $/; my $data = <DATA>; # unroll line continuations $data =~ s/\\\n\s+//g; # munge $data =~ s/'/\\'/g; # escape out quote char $data =~ s/}/},/g; $data =~ s/\s*{/' => {/g; $data =~ s/\s*:\s*/' => '/g; $data =~ s/\s*;/',/g; $data =~ s/^(\s*)(?=\w)/$1'/mg; print $data; my $var = eval "{ $data }"; use Data::Dumper; print Dumper $var; __DATA__ library(ABC) { date : 04/10/04; revision : 1.5; conditions () { weather : o'bright ; temperature : 55 C; visible : heavy \ fog; } location(work) : 1; }

    There is no doubt that ParseRecDescent could do a better job. But probably not in less lines.

    cheers

    tachyon

Re: Recursive Function To Parse File For Generating Hash
by gaal (Parson) on Nov 05, 2004 at 08:09 UTC
    Or YAML, as occasionally bears mention...