kepler has asked for the wisdom of the Perl Monks concerning the following question:

I posted earlier a question regarding XOR in perl strings that was quickly solved. Now I have another. Let's assume I have a string of the type "hello %f.2f world". I want to xor everything but any regex that seems like [\%0-9a-z\.\ \=]. So in this case "hello" and "world" shoul be "xored" but not the rest. Is there an easy way to do this?

Kind regards

Kepler

Replies are listed 'Best First'.
Re: Partial Xor in string
by davido (Cardinal) on Sep 18, 2016 at 16:45 UTC

    Format strings can be simple or hard depending on how you look at them. If you need to actually parse the placeholders at their semantic meaning, that's a little harder. But if all you need to do is detect a placeholder, it's pretty easy, I think: Find any % character not preceded by a \ backslash, and continue until a space character or end of format string.

    split when used with capturing parens returns both the items split out, and the delimiters. Consider the following:

    use strict; use warnings; my $pattern = qr{(?<!\\)(%[^\s]+)}; my @strings = ( '%.02f foo', 'foo %.02f', '%f.02f', 'foo %0.02f bar', '\%f.02f', 'foo \%.02f', '\%.02f foo', 'foo \%.02f bar', ); foreach my $string (@strings) { my @parts = split /$pattern/, $string; print "String: $string. Parts: ", (map {'(' . $_ . ')'} @parts), +"\n"; my $c = 1; foreach my $part (@parts) { print $c++ % 2 ? "\tComponent: $part\n" : "\tDelimiter: $part\ +n"; } }

    The output is:

    String: %.02f foo. Parts: ()(%.02f)( foo) Component: Delimiter: %.02f Component: foo String: foo %.02f. Parts: (foo )(%.02f) Component: foo Delimiter: %.02f String: %f.02f. Parts: ()(%f.02f) Component: Delimiter: %f.02f String: foo %0.02f bar. Parts: (foo )(%0.02f)( bar) Component: foo Delimiter: %0.02f Component: bar String: \%f.02f. Parts: (\%f.02f) Component: \%f.02f String: foo \%.02f. Parts: (foo \%.02f) Component: foo \%.02f String: \%.02f foo. Parts: (\%.02f foo) Component: \%.02f foo String: foo \%.02f bar. Parts: (foo \%.02f bar) Component: foo \%.02f bar

    Given that, the following will do what you want:

    use strict; use warnings; my $pattern = qr{(?<!\\)(%[^\s]+)}; my @strings = ( '%.02f foo', 'foo %.02f', '%f.02f', 'foo %0.02f bar', '\%f.02f', 'foo \%.02f', '\%.02f foo', 'foo \%.02f bar', ); my $key = join q(), map chr rand 255, 0 .. 2048; foreach my $string (@strings) { my $pos = 0; my @parts = split /$pattern/, $string; my $c; my @enc; foreach my $part (@parts) { my $enc = $part; if (!($c++ % 2)) { my $len = length $part; $enc = $enc ^ substr $key, $pos, $len; $pos += $len; } push @enc, $enc; } my $estring = join q{}, @enc; print "String: ($string) => Encoded: ($estring)\n"; }

    It's pretty inelegant code but I've got kids jabbering nearby as I type. :) Also we haven't dealt with the fact that the input could be larger than the key length, so it's a pretty basic solution, but it's enough to get the idea. Here's some sample output:

    String: (%.02f foo) => Encoded: (%.02f$
                                           �)
    String: (foo %.02f) => Encoded: (b�]%.02f)
    String: (%f.02f) => Encoded: (%f.02f)
    String: (foo %0.02f bar) => Encoded: (b�]%0.02f�5�)
    String: (\%f.02f) => Encoded: (XH�S�S2)
    String: (foo \%.02f) => Encoded: (b�]�Dz��z)
    String: (\%.02f foo) => Encoded: (XH�M�t��s)
    String: (foo \%.02f bar) => Encoded: (b�]�Dz��zG�^
    )
    

    There is a problem with the entire premise, however. While it's pretty easy to parse the original string looking for placeholders, the encrypted string could break our simple parsing rules, so to take it back to an unencrypted string becomes nearly impossible to do reliably. While in the original string a % that is not preceded by a \ represents the start of a placeholder, and a space or end of string represents the end, the encryption process could inject %, \, and spaces anywhere in the encrypted string.

    Also, just because split works here doesn't make it ideal. I would probably prefer a s/$pattern/$replacement/eg substitution regexp, though it would still suffer from the fundamental problem of making encrypted versus placeholders difficult to distinguish for decryption.


    Dave

      ... any % character not preceded by a \ backslash ...

      I don't really understand the context of kepler's OPed question, but if that's supposed to be an (s)printf format specifier, shouldn't that be "any % character not preceded by a %"?

      c:\@Work\Perl\monks\Denis.Beurive>perl -wMstrict -le "printf qq{\%d x %%d y %d \n}, 123, 543; printf qq{\%0.2f x %%0.2f y %0.2f \n}, 12.34, 54.32; " 123 x %d y 543 12.34 x %0.2f y 54.32


      Give a man a fish:  <%-{-{-{-<

        Yes, you're correct, I was thinking that % is escaped with a backslash, not another %. Memory lapse.

        In that case the pattern might look like this:

        split /(?<!%)(%[^%\s]+)/

        I'm still a little curious what problem we're really solving. Why parse and encrypt a sprintf format specifier in the first place? A format string should be considered code, not data accepted from the outside world. What's the underlying need here?

        perl -E 'my $f = shift @ARGV; say sprintf $f, "foo"' '%9999999999s'

        (consumes 1.3GB RAM)


        Dave

Re: Partial Xor in string
by AnomalousMonk (Archbishop) on Sep 18, 2016 at 21:11 UTC
    I posted earlier a question regarding XOR in perl strings ...

    To help make clear the context of XOR here, could you please post a link to the prior (and possibly related) thread?

    ... I have a string of the type "hello %f.2f world". I want to xor everything but any regex that seems like [\%0-9a-z\.\ \=]. So in this case "hello" and "world" shoul be "xored" but not the rest.

    I don't understand this. Do you want to do some string-xor operation(s) on some parts of the string that do not match the  [\%0-9a-z\.\ \=] character class? In that case, I can't see any character that doesn't match; the entire string matches, including 'hello' and 'world'.

    If you have two parts of a string and you xor them together, what's supposed to happen to the result? WRT string-xoring, if you have two unequal-length strings to xor, what is supposed to happen? (I'm sure I have other areas of misunderstanding.)

    Update: Oh, wait... Maybe I should have read Newest Nodes bottom-to-top instead of vice-versa. Does this pertain to Encrypt/decrypt string in C and Perl?


    Give a man a fish:  <%-{-{-{-<