comment on

Format strings can be simple or hard depending on how you look at them. If you need to actually parse the placeholders at their semantic meaning, that's a little harder. But if all you need to do is detect a placeholder, it's pretty easy, I think: Find any % character not preceded by a \ backslash, and continue until a space character or end of format string.

split when used with capturing parens returns both the items split out, and the delimiters. Consider the following:

use strict;
use warnings;

my $pattern = qr{(?<!\\)(%[^\s]+)};


my @strings = (
    '%.02f foo',
    'foo %.02f',
    '%f.02f',
    'foo %0.02f bar',
    '\%f.02f',
    'foo \%.02f',
    '\%.02f foo',
    'foo \%.02f bar',
);

foreach my $string (@strings) {
    my @parts = split /$pattern/, $string;
    print "String: $string.  Parts: ", (map {'(' . $_ . ')'} @parts), 
+"\n";
    my $c = 1;
    foreach my $part (@parts) {
        print $c++ % 2 ? "\tComponent: $part\n" : "\tDelimiter: $part\
+n";
    }
}
[download]

The output is:

String: %.02f foo.  Parts: ()(%.02f)( foo)
    Component: 
    Delimiter: %.02f
    Component:  foo
String: foo %.02f.  Parts: (foo )(%.02f)
    Component: foo 
    Delimiter: %.02f
String: %f.02f.  Parts: ()(%f.02f)
    Component: 
    Delimiter: %f.02f
String: foo %0.02f bar.  Parts: (foo )(%0.02f)( bar)
    Component: foo 
    Delimiter: %0.02f
    Component:  bar
String: \%f.02f.  Parts: (\%f.02f)
    Component: \%f.02f
String: foo \%.02f.  Parts: (foo \%.02f)
    Component: foo \%.02f
String: \%.02f foo.  Parts: (\%.02f foo)
    Component: \%.02f foo
String: foo \%.02f bar.  Parts: (foo \%.02f bar)
    Component: foo \%.02f bar
[download]

Given that, the following will do what you want:

use strict;
use warnings;

my $pattern = qr{(?<!\\)(%[^\s]+)};


my @strings = (
    '%.02f foo',
    'foo %.02f',
    '%f.02f',
    'foo %0.02f bar',
    '\%f.02f',
    'foo \%.02f',
    '\%.02f foo',
    'foo \%.02f bar',
);

my $key = join q(), map chr rand 255, 0 .. 2048;

foreach my $string (@strings) {
    my $pos = 0;
    my @parts = split /$pattern/, $string;
    my $c;
    my @enc;
    foreach my $part (@parts) {
        my $enc = $part;
        if (!($c++ % 2)) {
            my $len = length $part;
            $enc = $enc ^ substr $key, $pos, $len;
            $pos += $len;
        }
        push @enc, $enc;
    }
    my $estring = join q{}, @enc;
    print "String: ($string) => Encoded: ($estring)\n";
}
[download]

It's pretty inelegant code but I've got kids jabbering nearby as I type. :) Also we haven't dealt with the fact that the input could be larger than the key length, so it's a pretty basic solution, but it's enough to get the idea. Here's some sample output:

String: (%.02f foo) => Encoded: (%.02f$
                                       �)
String: (foo %.02f) => Encoded: (b�]%.02f)
String: (%f.02f) => Encoded: (%f.02f)
String: (foo %0.02f bar) => Encoded: (b�]%0.02f�5�)
String: (\%f.02f) => Encoded: (XH�S�S2)
String: (foo \%.02f) => Encoded: (b�]�Dz��z)
String: (\%.02f foo) => Encoded: (XH�M�t��s)
String: (foo \%.02f bar) => Encoded: (b�]�Dz��zG�^
)

There is a problem with the entire premise, however. While it's pretty easy to parse the original string looking for placeholders, the encrypted string could break our simple parsing rules, so to take it back to an unencrypted string becomes nearly impossible to do reliably. While in the original string a % that is not preceded by a \ represents the start of a placeholder, and a space or end of string represents the end, the encryption process could inject %, \, and spaces anywhere in the encrypted string.

Also, just because split works here doesn't make it ideal. I would probably prefer a s/$pattern/$replacement/eg substitution regexp, though it would still suffer from the fundamental problem of making encrypted versus placeholders difficult to distinguish for decryption.

Dave

In reply to Re: Partial Xor in string by davido
in thread Partial Xor in string by kepler

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.