in reply to Partial Xor in string

Format strings can be simple or hard depending on how you look at them. If you need to actually parse the placeholders at their semantic meaning, that's a little harder. But if all you need to do is detect a placeholder, it's pretty easy, I think: Find any % character not preceded by a \ backslash, and continue until a space character or end of format string.

split when used with capturing parens returns both the items split out, and the delimiters. Consider the following:

use strict; use warnings; my $pattern = qr{(?<!\\)(%[^\s]+)}; my @strings = ( '%.02f foo', 'foo %.02f', '%f.02f', 'foo %0.02f bar', '\%f.02f', 'foo \%.02f', '\%.02f foo', 'foo \%.02f bar', ); foreach my $string (@strings) { my @parts = split /$pattern/, $string; print "String: $string. Parts: ", (map {'(' . $_ . ')'} @parts), +"\n"; my $c = 1; foreach my $part (@parts) { print $c++ % 2 ? "\tComponent: $part\n" : "\tDelimiter: $part\ +n"; } }

The output is:

String: %.02f foo. Parts: ()(%.02f)( foo) Component: Delimiter: %.02f Component: foo String: foo %.02f. Parts: (foo )(%.02f) Component: foo Delimiter: %.02f String: %f.02f. Parts: ()(%f.02f) Component: Delimiter: %f.02f String: foo %0.02f bar. Parts: (foo )(%0.02f)( bar) Component: foo Delimiter: %0.02f Component: bar String: \%f.02f. Parts: (\%f.02f) Component: \%f.02f String: foo \%.02f. Parts: (foo \%.02f) Component: foo \%.02f String: \%.02f foo. Parts: (\%.02f foo) Component: \%.02f foo String: foo \%.02f bar. Parts: (foo \%.02f bar) Component: foo \%.02f bar

Given that, the following will do what you want:

use strict; use warnings; my $pattern = qr{(?<!\\)(%[^\s]+)}; my @strings = ( '%.02f foo', 'foo %.02f', '%f.02f', 'foo %0.02f bar', '\%f.02f', 'foo \%.02f', '\%.02f foo', 'foo \%.02f bar', ); my $key = join q(), map chr rand 255, 0 .. 2048; foreach my $string (@strings) { my $pos = 0; my @parts = split /$pattern/, $string; my $c; my @enc; foreach my $part (@parts) { my $enc = $part; if (!($c++ % 2)) { my $len = length $part; $enc = $enc ^ substr $key, $pos, $len; $pos += $len; } push @enc, $enc; } my $estring = join q{}, @enc; print "String: ($string) => Encoded: ($estring)\n"; }

It's pretty inelegant code but I've got kids jabbering nearby as I type. :) Also we haven't dealt with the fact that the input could be larger than the key length, so it's a pretty basic solution, but it's enough to get the idea. Here's some sample output:

String: (%.02f foo) => Encoded: (%.02f$
                                       �)
String: (foo %.02f) => Encoded: (b�]%.02f)
String: (%f.02f) => Encoded: (%f.02f)
String: (foo %0.02f bar) => Encoded: (b�]%0.02f�5�)
String: (\%f.02f) => Encoded: (XH�S�S2)
String: (foo \%.02f) => Encoded: (b�]�Dz��z)
String: (\%.02f foo) => Encoded: (XH�M�t��s)
String: (foo \%.02f bar) => Encoded: (b�]�Dz��zG�^
)

There is a problem with the entire premise, however. While it's pretty easy to parse the original string looking for placeholders, the encrypted string could break our simple parsing rules, so to take it back to an unencrypted string becomes nearly impossible to do reliably. While in the original string a % that is not preceded by a \ represents the start of a placeholder, and a space or end of string represents the end, the encryption process could inject %, \, and spaces anywhere in the encrypted string.

Also, just because split works here doesn't make it ideal. I would probably prefer a s/$pattern/$replacement/eg substitution regexp, though it would still suffer from the fundamental problem of making encrypted versus placeholders difficult to distinguish for decryption.


Dave

Replies are listed 'Best First'.
Re^2: Partial Xor in string
by AnomalousMonk (Archbishop) on Sep 18, 2016 at 20:42 UTC
    ... any % character not preceded by a \ backslash ...

    I don't really understand the context of kepler's OPed question, but if that's supposed to be an (s)printf format specifier, shouldn't that be "any % character not preceded by a %"?

    c:\@Work\Perl\monks\Denis.Beurive>perl -wMstrict -le "printf qq{\%d x %%d y %d \n}, 123, 543; printf qq{\%0.2f x %%0.2f y %0.2f \n}, 12.34, 54.32; " 123 x %d y 543 12.34 x %0.2f y 54.32


    Give a man a fish:  <%-{-{-{-<

      Yes, you're correct, I was thinking that % is escaped with a backslash, not another %. Memory lapse.

      In that case the pattern might look like this:

      split /(?<!%)(%[^%\s]+)/

      I'm still a little curious what problem we're really solving. Why parse and encrypt a sprintf format specifier in the first place? A format string should be considered code, not data accepted from the outside world. What's the underlying need here?

      perl -E 'my $f = shift @ARGV; say sprintf $f, "foo"' '%9999999999s'

      (consumes 1.3GB RAM)


      Dave