Hello.

Yesterday I was solving problem in a contest trying to use regexes. The statement of the problem is here: http://codeforces.com/contest/633/problem/C. There are some performance limits: time and memory.

How would you solve this problem?
Shortly saying, the main problem is to use some words from given dictionary (each word zero or more times) to cover a given long string without overlays. For example, we have dictionary - qw { cb ae cba ed } and string 'cbaed'. We can cover it with two words: 'cba' and 'ed'.
I used embedded code in regex - /(?{ ... })/ and array to save submatches. And seems that if 'multi_captures' would be implemented in regex then solution was easier. Mine solution looks so:
chomp( (undef, $_, undef, @words) = <> ); %restore_case = map { lc, $_ } @words; $list = join '|', map { scalar reverse lc } @words; m/^ (?: ($list | $ | (?{ pop @submatches }) (?!) ) (?{ push @submatches, $^N }) )+ $/x; pop @submatches; print join ' ', map { $restore_case{ scalar reverse } } @submatches
Or you can see it with some tests here on a platform - http://codeforces.com/contest/633/submission/16382698
Explanation:
m/^ (?: ( $list # dictionary joined with '|' | $ # if end of line then success, so don't pop array | (?{ pop @submatches }) (?!) # pop array and suddenly fa +il ) (?{ push @submatches, $^N }) # after every matched word, push + it to array # $^N - last captured group )+ $/x;
I wanted to discuss: are there other solutions using regexes, and is it possible to increase performance?

I've read nice node - Advanced techniques with regex quantifiers. And tried to use '$^R' instead of array '@submatches' and it memory-limited on the contest platform - http://codeforces.com/contest/633/submission/16383676 . For me it seems that using push/pop is better than using arrays with $^R.
Later I took an example from topic about Advanced techniques:
":aa2bb4cc6dd8" =~ / (:) (?{ [] }) # initialize $^R (?: (\w\w) (\d) # add captures to $^R: (?{ [@{$^R}, [$2, $3]] }) )* /x;
And rewrote it in push/pop array manner:
my @R; ":aa2bb4cc6dd8" =~ / (:) (?: (\w\w) (\d) (?{ push @R, [$2, $3] }) )* /x;
And it seems to work much more faster on bigger inputs. I was measuring on ideone.com (semicolon removed):
This code worked 1.3 s (string x 1e3) (http://ideone.com/zMOyLS):
use Data::Dumper; sub dd { print Dumper(shift) }; ("aa2bb4cc6dd8" x 1e3) =~ / (?{ [] }) # initialize $^R (?: (\w\w) (\d) # add captures to $^R: (?{ [@{$^R}, [$1, $2]] }) )* /x; dd @{ $^R }[ -1 ]
And this code worked 0.06 s (string x 5e3) (http://ideone.com/n3cKVr):
use Data::Dumper; sub dd { print Dumper(shift) }; my @R = (); ("aa2bb4cc6dd8" x 5e3) =~ / (?: (\w\w) (\d) # push captures to @R: (?{ push @R, [$1, $2] }) )* /x; dd @R[ -1 ]
both output:
$VAR1 = [ 'dd', '8' ];

In reply to Multi_captures needed by rsFalse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.