Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

C strings, unescaping of

by Anonymous Monk
on Sep 30, 2013 at 13:47 UTC ( [id://1056362]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm parsing (C-style) strings in strace logs, but the problem itself is rather generic. To unescape the strings I've tried:

sub trx { (my $x = $_[0]) =~ tr/rnbaftv/\r\n\b\a\f\t\013/; $x; } $s =~ s{\\([0-7]{1,3})|\\(.)}{defined $1 ? chr oct $1 : trx $2}eg;

Or like this:
  $s =~ s{\\([0-7]{1,3})|\\(.)}{defined $1 ? chr oct $1 : (grep ~y/rnbaftv/\r\n\b\a\f\t\013/, "$2")[0]}eg;

Now the question is, since this appears such a simple common task, isn't there a better more concise way of doing it? Thank you!

Replies are listed 'Best First'.
Re: C strings, unescaping of
by daxim (Curate) on Sep 30, 2013 at 14:05 UTC
    Use Encode::Escape::ASCII. Under the hood it does the same, but your maintenance programmer gets a readable API and documentation.
      That package does not implement C-style escapes. The \v is missing. Furthermore, it looks rather suspect. Observe:
      $ echo -n 'v\v\134aa' | perl -e 'use Encode::Escape::ASCII; print deco +de "ascii-escape", $_ for <>;' | hexdump -C 00000000 76 76 07 61 |vv.a| 00000004

      This does not look right to me at all!

Re: C strings, unescaping of
by Marshall (Canon) on Sep 30, 2013 at 14:53 UTC
    That is rather odd sounding to me.
    sub trx { (my $x = $_[0]) =~ tr/rnbaftv/\r\n\b\a\f\t\013/; $x; }

    The reason is that I've never seen a 'C' trace log with such an ASCII syntax. This doesn't seem to make sense to me.

    These \r\n\b\a\f\t\013 are basically "space" characters "\s" as far as Perl regex is concerned with the exception of the "bell character", \a which I'm not exactly sure about.

    I haven't seen a 2 lines like:
    line1nline2n
    How do you differentiate between the "n" between the lines and the "n" within the "line"?

      That function is used inside of substitute. There are actually 2 lines of code. Substitute identifies the letters to be replaced and the trx function does actual replacement. At first, it also confused me :)
Re: C strings, unescaping of
by AnomalousMonk (Archbishop) on Oct 01, 2013 at 18:30 UTC

    Since the two-character literal sequence  '\v' (backslash, "v") seems to be the fly in the ointment, why not handle it as the sole exceptional case? The code is simpler, but still needs substitution replacement  /e code evaluation involving an eval and I've done no Benchmark-ing to see if it's actually faster; it always pays to be suspicious when  /e or eval are involved.

    >perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; my $s = join '', map qq{$_\\$_$_\\$_\\$_$_}, qw(r n b a f t v) ; my $o = join '', 'x', map qq{\\${_}y\\$_\\${_}x}, qw(0 7 10 77 100 377) ; print qq{'$s'}; print qq{'$o'}; print ''; ;; for ($s, $o) { s{ ( \\ (?: [0-7]{1,3} | [rnbaft])) | \\v } { $1 ? eval qq{qq{$1}} : qq{\013} }xmsge; } print raw_hex($s); print raw_hex($o); print ''; ;; ok $s eq qq{r\rr\r\rrn\nn\n\nnb\bb\b\bba\aa\a\aaf\ff\f\fft\tt\t\ttv\0 +13v\013\013v}; ok $o eq qq{x\000y\000\000x\007y\007\007x\010y\010\010x\077y\077\077x +\100y\100\100x\377y\377\377x}; ;; ;; sub raw_hex { return join ' ', unpack '(H2)*', $_[0]; } " 'r\rr\r\rrn\nn\n\nnb\bb\b\bba\aa\a\aaf\ff\f\fft\tt\t\ttv\vv\v\vv' 'x\0y\0\0x\7y\7\7x\10y\10\10x\77y\77\77x\100y\100\100x\377y\377\377x' 72 0d 72 0d 0d 72 6e 0a 6e 0a 0a 6e 62 08 62 08 08 62 61 07 61 07 07 6 +1 66 0c 66 0c 0c 66 74 09 74 09 09 74 76 0b 76 0b 0b 76 78 00 79 00 00 78 07 79 07 07 78 08 79 08 08 78 3f 79 3f 3f 78 40 79 4 +0 40 78 ff 79 ff ff 78 ok 1 ok 2 ok 3 - no warnings 1..3
      Eval is avoidable:
      { my %T = ( (map {chr() => chr} 0..0377), (map {sprintf("%o",$_) => chr} 0..07), (map {sprintf("%02o",$_) => chr} 0..077), (map {sprintf("%03o",$_) => chr} 0..0377), (split //, "r\rn\nb\ba\af\ft\tv\013") ); sub unescape { s/\\([0-7]{1,3}|.)/$T{$1}/g for @_ } }

      I am saddened no-one plays golf in this Monastery.

      Thanks! The eval makes it ~10 times slower. And don't forget \" \\.

      s{\\((v)|[0-7]{1,3}|.)}{$2 ? "\013" : eval qq{qq{$&}}}eg;

        s{\\((v)|[0-7]{1,3}|.)}{$2 ? "\013" : eval qq{qq{$&}}}eg;

        Part of the enormous speed penalty may be attributable not only to eval, but to the use of the  $& matching special variable (see Variables related to regular expressions in perlvar), which can work wonders for putting the brakes on not just an individual regex, but on the execution of every regex in an application!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1056362]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-19 06:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found