comment on

Hello.

Yesterday I was solving problem in a contest trying to use regexes. The statement of the problem is here: http://codeforces.com/contest/633/problem/C. There are some performance limits: time and memory.

How would you solve this problem?
Shortly saying, the main problem is to use some words from given dictionary (each word zero or more times) to cover a given long string without overlays. For example, we have dictionary - qw { cb ae cba ed } and string 'cbaed'. We can cover it with two words: 'cba' and 'ed'.
I used embedded code in regex - /(?{ ... })/ and array to save submatches. And seems that if 'multi_captures' would be implemented in regex then solution was easier. Mine solution looks so:

chomp( (undef, $_, undef, @words) = <> );

%restore_case = map { lc, $_ } @words;

$list = join '|', map { scalar reverse lc } @words;

m/^ (?: ($list | $ | (?{ pop @submatches }) (?!) ) 
(?{ push @submatches, $^N }) )+ $/x;

pop @submatches;

print join ' ', map { $restore_case{ scalar reverse } } @submatches
[download]

Or you can see it with some tests here on a platform - http://codeforces.com/contest/633/submission/16382698
Explanation:

m/^ 
    (?: 
        ( $list  # dictionary joined with '|'
            | $  # if end of line then success, so don't pop array
            | (?{ pop @submatches }) (?!)  # pop array and suddenly fa
+il
        ) 
        (?{ push @submatches, $^N })  # after every matched word, push
+ it to array
        # $^N - last captured group
    )+ 
$/x;
[download]

I wanted to discuss: are there other solutions using regexes, and is it possible to increase performance?

I've read nice node - Advanced techniques with regex quantifiers. And tried to use '$^R' instead of array '@submatches' and it memory-limited on the contest platform - http://codeforces.com/contest/633/submission/16383676 . For me it seems that using push/pop is better than using arrays with $^R.
Later I took an example from topic about Advanced techniques:

":aa2bb4cc6dd8" =~ / (:)
                     (?{ [] })   # initialize $^R
                     (?:
                         (\w\w) (\d)
                         
                         # add captures to $^R:
                         (?{ [@{$^R}, [$2, $3]] })
                     )*
                   /x;
[download]

And rewrote it in push/pop array manner:

my @R;
":aa2bb4cc6dd8" =~ / (:)
                     (?:
                         (\w\w) (\d)
                         
                         (?{ push @R, [$2, $3] })
                     )*
                   /x;
[download]

And it seems to work much more faster on bigger inputs. I was measuring on ideone.com (semicolon removed):
This code worked 1.3 s (string x 1e3) (http://ideone.com/zMOyLS):

use Data::Dumper;
sub dd { print Dumper(shift) };

("aa2bb4cc6dd8" x 1e3) =~ /
                     (?{ [] })   # initialize $^R
                     (?:
                         (\w\w) (\d)
                         
                         # add captures to $^R:
                         (?{ [@{$^R}, [$1, $2]] })
                     )*
                   /x;
                   
dd @{ $^R }[ -1 ]
[download]

And this code worked 0.06 s (string x 5e3) (http://ideone.com/n3cKVr):

use Data::Dumper;
sub dd { print Dumper(shift) };

my @R = ();

("aa2bb4cc6dd8" x 5e3) =~ /
                     (?:
                         (\w\w) (\d)
                         
                         # push captures to @R:
                         (?{ push @R, [$1, $2] })
                     )*
                   /x;
                   
dd @R[ -1 ]
[download]

both output:

$VAR1 = [
          'dd',
          '8'
        ];
[download]

In reply to Multi_captures needed by rsFalse

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.