AdriftOnMemoryBliss has asked for the wisdom of the Perl Monks concerning the following question:

Howdy Monks,

I have a database of strings that can fit into a series of defined format classes. For example here are some examples, along with a description of the formats:

t(1;3)(q15;p13) ==> t([int], [int])([p|q][int], p|q][int]) inv(1)(p13p11.1) ==> inv([int])([p|q][int][p|q][int]) +3 ==> +[int]

I need to be able to "round-trip" from these formats. That is, I need to be able to take a formatted string and determine:

  1. which class it falls into
  2. what are the values for each of the parameters

I also need to be able to take those same to data-types and create the formatted string -- that's what I mean by "round-tripping".

I have the regex's and logic set-up for doing the first part, but is there a way to combine that with the round-tripping? For example, can I use a reg-ex like the one below to create a new string by giving it values of $1, $2, $3, $4?

qr/^der\((.{1,2});(.{1,2})\)\(([p|q].*);([p|q].*)\)$/

Alternately (and perhaps better), is there another way to get this done that stores the formats and "logic" in a single place? Any ideas/comments very much appreciated!

Replies are listed 'Best First'.
Re: Parsing/Deparsing a Formatted String
by ikegami (Patriarch) on Jun 15, 2005 at 19:57 UTC

    Regexp cannot be used to format data, but you could pair up regexps and format data:

    %patterns = ( t => [ qr/^t\((\d+);(\d+)\)\(([p|q]\d+);([p|q]\d+)\)$/, 't(%d;%d)(%s;%s)' ], inv => [ qr/^inv\((\d+)\)\(([p|q]\d+)([p|q]\d+)\)$/, 'inv(%d)(%s%s)' ], '+' => [ qr/^\+(\d+)$/, '+%d' ] ); my @parsed; foreach ( 't(1;3)(q15;p13)', 'inv(1)(p13p11)', '+3', ) { my $class; foreach my $class (keys(%patterns)) { my @fields; @fields = /$patterns{$class}[0]/ and push(@parsed, [ $class, \@fields ]); } } foreach (@parsed) { my ($class, $fields) = @$_; printf($patterns{$class}[1], @$fields); print(" is of class $class.\n"); } __END__ t(1;3)(q15;p13) is of class t. inv(1)(p13p11) is of class inv. +3 is of class +.
Re: Parsing/Deparsing a Formatted String
by GrandFather (Saint) on Jun 15, 2005 at 22:57 UTC

    If you have formal specifications for all the formats then you can parse the formats to generate matching regex's and printf format strings.

    Sketch code would look something like:

    my $format = ...; my $printfStr; my $regexStr; my $paramCount; while ($format) { my ($chunk) = $format =~ s/^(.*?|\[.*?])(?:\[|$)//g; if ($chunk =~ /^\[int]/) {# match an int $printfStr .= "%d"; $regexStr .= "(\\d+)"; ++$paramCount; } elsif ... else {# match the text $printfStr .= $chunk; $regexStr .= "\Q$chunk\E"; } }

    Note that you may need to handle variable whitespace and case indesnsitivity and maybe even nesting of format string elements.


    Perl is Huffman encoded by design.
Re: Parsing/Deparsing a Formatted String
by kral (Monk) on Jun 16, 2005 at 09:46 UTC
    Maybe you could use Parse::RecDescent for your aims.
    You can find a good documentation in Parse::RecDescent::FAQ.
    Maybe this module is a little oversized for this task, but it's the best way imho for write a clear code.
    ----------
    kral
    (I apologise for my english!)