Hello.
Yesterday I was solving problem in a contest trying to use regexes. The statement of the problem is here:
http://codeforces.com/contest/633/problem/C.
There are some performance limits: time and memory.
How would you solve this problem?
Shortly saying, the main problem is to use some words from given dictionary (each word zero or more times) to cover a given long string without overlays. For example, we have dictionary - qw { cb ae cba ed } and string 'cbaed'. We can cover it with two words: 'cba' and 'ed'.
I used embedded code in regex -
/(?{ ... })/ and array to save submatches. And seems that if 'multi_captures' would be implemented in regex then solution was easier. Mine solution looks so:
chomp( (undef, $_, undef, @words) = <> );
%restore_case = map { lc, $_ } @words;
$list = join '|', map { scalar reverse lc } @words;
m/^ (?: ($list | $ | (?{ pop @submatches }) (?!) )
(?{ push @submatches, $^N }) )+ $/x;
pop @submatches;
print join ' ', map { $restore_case{ scalar reverse } } @submatches
Or you can see it with some tests here on a platform -
http://codeforces.com/contest/633/submission/16382698
Explanation:
m/^
(?:
( $list # dictionary joined with '|'
| $ # if end of line then success, so don't pop array
| (?{ pop @submatches }) (?!) # pop array and suddenly fa
+il
)
(?{ push @submatches, $^N }) # after every matched word, push
+ it to array
# $^N - last captured group
)+
$/x;
I wanted to discuss: are there other solutions using regexes, and is it possible to increase performance?
I've read nice node -
Advanced techniques with regex quantifiers. And tried to use '$^R' instead of array '@submatches' and it memory-limited on the contest platform -
http://codeforces.com/contest/633/submission/16383676 . For me it seems that using push/pop is better than using arrays with $^R.
Later I took an example from topic about Advanced techniques:
":aa2bb4cc6dd8" =~ / (:)
(?{ [] }) # initialize $^R
(?:
(\w\w) (\d)
# add captures to $^R:
(?{ [@{$^R}, [$2, $3]] })
)*
/x;
And rewrote it in push/pop array manner:
my @R;
":aa2bb4cc6dd8" =~ / (:)
(?:
(\w\w) (\d)
(?{ push @R, [$2, $3] })
)*
/x;
And it seems to work much more faster on bigger inputs. I was measuring on ideone.com (semicolon removed):
This code worked 1.3 s (string x 1e3) (
http://ideone.com/zMOyLS):
use Data::Dumper;
sub dd { print Dumper(shift) };
("aa2bb4cc6dd8" x 1e3) =~ /
(?{ [] }) # initialize $^R
(?:
(\w\w) (\d)
# add captures to $^R:
(?{ [@{$^R}, [$1, $2]] })
)*
/x;
dd @{ $^R }[ -1 ]
And this code worked 0.06 s (string x 5e3) (
http://ideone.com/n3cKVr):
use Data::Dumper;
sub dd { print Dumper(shift) };
my @R = ();
("aa2bb4cc6dd8" x 5e3) =~ /
(?:
(\w\w) (\d)
# push captures to @R:
(?{ push @R, [$1, $2] })
)*
/x;
dd @R[ -1 ]
both output:
$VAR1 = [
'dd',
'8'
];
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.