TrekNoid has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, oh high ones...

I've been working on this for a while today, and I must just be missing something obvious

I'm receiving a string of characters that contains occasional 'special characters' (two slashes with an alpha character between them) that I need to convert back to their regular form.

Specifically, I need to convert:

\F\ to | \S\ to ^ \T\ to & \R\ to ~ \E\ to \
If you've ever worked with HL7 messaging protocol, you may have seen this before. (Basically, these five characters are message control characters, so the \x\ notation is how it lets you know to replace them with the reserved character)

Anyways, I started with the following:

$msgu =~ s/\\F\\/\|/g; $msgu =~ s/\\S\\/\^/g; $msgu =~ s/\\T\\/\&/g; $msgu =~ s/\\R\\/\~/g; $msgu =~ s/\\E\\/\\/g;
And that *almost* works perfectly. Where it doesn't work is if there's two of these special characters grouped around a normal character... For example, the string:
"AB\T\F\S\CD"
should translate to:
"AB&F^CD"
but it ends up as:
"AB\T|S\CD"
because I'm doing the global match on \F\ first. And since I can get these in any order, there's no way to put them in an order that always works.

So, what I *really* need is a way to do this 'left-to-right' where it finds the first \x\ character, and then translates it (if it's one of the valid 5), and then continue from there. Doing whichever comes up next in the string

Any advice for me to try or place I should be looking?

Trek

Replies are listed 'Best First'.
Re: Pattern Matching, left-to-right
by antirice (Priest) on Aug 20, 2004 at 20:11 UTC

    Perhaps something like...

    my %map = ( F => "|", S => "^", T => "&", R => "~", E => "\\" ); $msgu =~ s/\\([FSTRE])\\/$map{$1}/g;

    antirice    
    The first rule of Perl club is - use Perl
    The
    ith rule of Perl club is - follow rule i - 1 for i > 1

        I get AB&F^CD when I run it, which the OP says is what should happen. By his description of the task the code is also exactly what he's after. Which case do you have in mind that it fails for?

        Makeshifts last the longest.

Re: Pattern Matching, left-to-right
by ysth (Canon) on Aug 20, 2004 at 20:13 UTC
    %trans = qw/ F | S ^ T & R ~ E \\ /; $string =~ s/\\([FSTRE])\\/$trans{$1}/g;
•Re: Pattern Matching, left-to-right
by merlyn (Sage) on Aug 20, 2004 at 20:05 UTC
    Untested:
    $string =~ s/(\\(?:[FSTRE]\\)+)/ my $x = $1; $x =~ tr#FSTRE\\#|^&-\\#d; $x; /ge;

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: Ooops... I saw the \T\F\S\ in the source string, and figured that this was supposed to translate to &^|, when in fact the example says it translates to &F|. Ignore me. Other answers are better.
Re: Pattern Matching, left-to-right
by ambrus (Abbot) on Aug 20, 2004 at 22:10 UTC

    The easy way is of course to use a DFA matcher.

    For example, with flex:

    $ cat hl.l %option noyywrap %% \\F\\ putchar('|'); \\S\\ putchar('^'); \\T\\ putchar('&'); \\R\\ putchar('~'); \\E\\ putchar('\\'); $ flex hl.l && gcc -lfl -O lex.yy.c -o hl $ ./hl AB\T\F\S\CD AB&F^CD $

    There must be lots of DFA regexp packages available for perl on CPAN.

Re: Pattern Matching, left-to-right
by Aristotle (Chancellor) on Aug 21, 2004 at 15:01 UTC

    Here's another way to do it, with distinct patterns instead of a hash map:

    while( m/ \G .*? (?= \\[FSTRE]\\ ) /gx ) { my $pos = pos; s{ \G \\F\\ }{ "|" }egx or s{ \G \\S\\ }{ "^" }egx or s{ \G \\T\\ }{ "&" }egx or s{ \G \\R\\ }{ "~" }egx or s{ \G \\E\\ }{ "\\" }egx; pos() = $pos + 1; }

    In this code, m//g does the actual work of finding the control sequences in the string. The trick is to anchor all patterns with \G, which makes sure each pattern starts off where the last successful pattern stopped matching. That way you go through the string left-to-right.

    In case of s///g, this means s///g will usually be replacing exactly one single occurence, despite the /g. \G \\S\\ will only match multiple times if the control sequences appears multiple times back-to-back as in the string \S\\S\\S\.

    Unfortunately, if a s///g matches at least once, it also resets the end-of-last-successful-match position when it eventually fails. Therefore, the next match would normally start over at the beginning of the string, leading to recursive replacement problems with \E\S\ getting doubly translated to ^ instead of becoming just \S\ as per the spec. That explains the manual bookkeeping with pos, which queries and sets that position. With m//, this is elegantly avoidable by use of the /c modifier which means "do not reset end-of-last-position on failure".

    All that said and done, for this problem, this solution is both much less efficient and harder to understand than the hash map based ones given by others. I post this merely as a trivial demonstration of \G, which really shines when the patterns you want to match in a coordinated fashion are very non-uniform, unlike the ones in this case. m//gc is how you build true parsers in Perl.

    Makeshifts last the longest.

      fun with perl:

      substr($_, pos() - 3, 3) =~ tr/FSTRE\\/|^&~\\/d while m/\G.*?\\[FSTRE]\\/gs;
      Note: s modifier is required in the regexp

      Update: regexp was simplified

      substr($_, pos() - 3, 3) =~ tr/FSTRE\\/|^&~\\/d while m/\\[FSTRE]\\/g;

        Yep, although that wasn't really the point of my excercise. :-) I'd write that one a little differently:

        my %xlat = ( F => '|', S => '^', T => '&', R => '~' E => '\\', ); while( m/ \G .*? \\([FSTRE])\\ /gsx ){ substr $_, pos() - 3, 3, $xlat{ $1 }; }

        Update: same change as in parent node update applies.

        Makeshifts last the longest.

Re: Pattern Matching, left-to-right
by chance (Beadle) on Aug 20, 2004 at 21:38 UTC
    My bad.

    All I was shooting for was that if you want 'left to right' and you are having trouble getting something to work, I've often found that making explicitly 'left to right' (by putting it a while loop) can help me understand it. Once understanding has occurred, it can usually be re-written in a cooler way.

      I don't know how 1 while s/// is clearer than s///g.

      What's worse, it does not work correctly:

      my %map = ( F => "|", S => "^", T => "&", R => "~", E => "\\" ); $_ = "\\E\\S\\"; s/\\([FSTRE])\\/$map{$1}/g; print "s///g: [$_]\n"; $_ = "\\E\\S\\"; 1 while s/\\([FSTRE])\\/$map{$1}/; print "1 while s///: [$_]\n"; __END__ s///g: [\S\] 1 while s///: [^]

      \E\S\ gets translated to \S\, as per the OP's spec, but the 1 while s/// solution then blithely goes on to translate that as well. Oops.

      (Note that there's a bunch of minor mistakes in your code. The last slash in your s/// is missing and (FSTRE) should be ([FSTRE]).)

      Makeshifts last the longest.

Re: Pattern Matching, left-to-right
by ambrus (Abbot) on Aug 20, 2004 at 20:41 UTC

    I offer a simple solution. I suppose that there are backslashes in the original string that are not part of an escape sequence you want to transform, this won't work at least not this way.

    s/(\\.)\\/$1@/g; s/\\F@/|/g; s/\\S@/^/g; s/\\T@/&/g; s/\\R@/-/g; s/\\E@/\\/g;

      Don't know if it's legal but what about this situation?

      $msgu = "\\F\\\\F@";

      antirice    
      The first rule of Perl club is - use Perl
      The
      ith rule of Perl club is - follow rule i - 1 for i > 1

Re: Pattern Matching, left-to-right
by Prior Nacre V (Hermit) on Aug 21, 2004 at 13:50 UTC

    Tested successfully with the input you supplied:

    Update

    The last version did what I claimed but was still bad. Fair enough, thanks for spotting the problem Aristotle. The last version is behind the Read More .... The new version hopefully does what is wanted; here's my tests before and after changes:


    Update 2

    But that version was the same solution that others had :-(

    Here's a third version that doesn't use regexes. TMTOWTDI!


    # hl7_conv use strict; use warnings; my $input = q(AB\T\F\S\CD\E\E\E\R\R\R); my $gen_out = ''; my $exp_out = q(AB&F^CD\\E\\R~R); my %convs = ( E => '\\', F => '|', R => '~', S => '^', T => '&', ); $gen_out = hl7_replace($input); print "INPUT: $input\n"; print "GEN_OUT: $gen_out\n"; print "EXP_OUT: $exp_out\n"; print $0, ': ', $gen_out eq $exp_out ? 'SUCCESS!' : ' Z z . c8o, ', +"\n"; exit 0; sub hl7_replace { my $in = shift; my @input = split //, $in; my @output = (); for (my $i = 0; $i <= $#input; ++$i) { if ($input[$i] eq "\\" && ($i + 2) <= $#input && $convs{$input[$i + 1]} && $input[$i + 2] eq "\\") { push @output, $convs{$input[$i + 1]}; $i += 2; } else { push @output, $input[$i]; } } return join '', @output; }

    Regards,

    PN5

      If you had tested it with more input, you'd have found it is wrong.

      You wrote:

      foreach my $key (keys %convs) { $in =~ s/$convs{$key}/$key/g; }

      The OP wrote:

      And that *almost* works perfectly. Where it doesn't work is if there's two of these special characters grouped around a normal character... [...] And since I can get these in any order, there's no way to put them in an order that always works.

      Since you stored the patterns as values in a hash, the order in which they're substituted gets randomized such that with the input you tested with, the code happens to work.

      Makeshifts last the longest.

        Thanks - fixed - see update.

        Regards,

        PN5

Re: Pattern Matching, left-to-right
by Anonymous Monk on Aug 21, 2004 at 12:18 UTC
    Greetings, oh high ones...
    No drugs allowed in monastery