More sexeger

OK, I think far too little attention has been devoted to the sexeger(tm) (coined just a while back, it was recently trademarked by a numbered company out of New York in 2001 as applied to computer software -- though I'm not sure who would try to market a product with a name like that! ;) ... Also a patent is pending in the United States -- what's confusing is the application talks about "artistic license" and has a strange demonic looking camel stamp in the applicant space - odd really.)

As is explained on an unnamed saints website and previously brought to the attention of the monks, sexeger are primarily useful as a way to increase the speed of regular expression matches. It is demonstrated that a speed increase of many orders of magnitude is possible through the proper application of sexeger.

However, others may simply find it useful, in cases where obfuscation is desired or they wish to make their code even less maintainable, a sort of perl programming poison pill strategy. It's easy to demonstrate your mastery by listing your code backward, but not _exactly_ backward. See in sexeger speak, regular expressions preserve their grouping and other logical elements while reversing the strings in each element and where appropriate ordering them likewise in reverse. An inappropriate reversal is the character class. Due to their commutative-like properties, there is no need to reverse these classes. A better obfuscation strategy would be to take reoccuring classes and randomly permute their elements so as to make them harder to decipher at first glance and/or break out into sub-expressions.

Now as explained, there are many cases where sexegers are apropos and helpful, however it has also come to my attention that there may similarly be cases, where employing them would benefit the cause of obfuscation more so than any performance increase:

#4p1s0: # Already optimal matching forward
   (?:   a  (?: ble | n(?:ce|t) | te | l )
     |   e  (?: ment | n(?:ce|t) | r )
     |   i  (?: ble | sm | ti | ve | ze | c )
     |   ment
     |   ous?
   )
#4p0s1r: # No performance gains matching in reverse
   (?:   (?: elb | (?:ec|t)n | et | l )  a
     |   (?: tnem | (?:ec|t)n | r )  e
     |   (?: elb | ms | it | ev | ez | c )  i
     |   tnem
     |   s?uo
   )
#4p1s0r: # Likewise this is not an optimal match, although:
   (?:   ci
     |   e  (?: cn[ae] | lb[ai] | ta | vi | zi )
     |   iti
     |   la
     |   msi
     |   re
     |   suo   
     |   tn  (?: eme? | [ae] )
     |   uo
   )
#4p0s1: # It's better than this:
   (?:ic|(?:[ae]nc|[ai]bl|at|iv|iz)e|iti|al|ism|er|ous|(?:e?me|[ae])nt
+|ou)
[download]

One can analyse the above matches like follows:
It can be said that each element in a match group with sub groups is a root or lowest common factor of the group. Thus the match knows if it finds this common root of the sub-group it has also found one of the match elements of the sub-group -- and if not, it can discard these from the solution set immediately. The less the amount of such factors at the base level of a match group, usually the better the performance. An equation might summarise the total time required to perform a match. And I'll leave this as an open challenge, as I haven't perfected this to a science yet. So far, I've just been doing trial and error benchmarking using the well-known Benchmark by Jarkko Hietaniemi and using my common sense knowledge (I'm no master regexer) of the regex engine.

Also consider:

#3p0s1: # Less optimal
(?:   (?: icat | ativ | aliz )  e
  |   iciti
  |   (?: ica | fu )  l
  |   ness
)
#3p1s0r: # More optimal - performance gains in reverse
(?:   e  (?: taci | vita | zila )
  |   itici
  |   l  (?: aci | uf )
  |   ssen
)
[download]

In the above it is obvious that the latter is a good sexeger due to it's suffix heavy commonality. Remember that in a sexeger a prefix becomes a suffix and a suffix a prefix.

So far there is no way to automatically generate all sexeger given the regexes wishing to be transformed. However on the site above, the unnamed saint, has indicated work is in progress on just such a tool. Until then for complex regexes, hand reversal can prove to be both instructional and fun. Usually it takes a very small amount of time to do, once you can get over the initial disorientation and accidental typing of (:?) instead of (?:). My strategy is to do like follows now:

   (?:abcd|efg|won|I|[now]|know|my|regexes)
   (?:abcdcba|efgfe|wonow|[now]|I|knowonk|mym|regexesexeger)
   (?x-ism: dcba|gfe |     now |I| [own] |wonk|y m|sexeger)
[download]

Yes indeed, it is all very fun. I could do this for hours on end, and make a day out of it. Joy! ... umm.. But I just had to cheat. :)

As the above code blurbs might have you gather, I have been using a helper script to create these different forms of the same search list. As a new way to create word search regexes employing segexer, I'll show expressions like these can be automatically generated like so:

from helper.pl:

#!/usr/bin/perl
use Regex::PreSuf;
my @step4list   =qw(
   al able ance ant ate
   ence ent ement er
   ible ic ism iti ive ize
   ment
   ou ous
);
grep $_=reverse, @step4list;
print "4p1s0r:", presuf ({suffixes=>0}, @step4list), "\n";
print "4p1s1r:", presuf ({suffixes=>1}, @step4list), "\n";
print "4p0s1r:", presuf ({prefixes=>0}, @step4list), "\n";
[download]

This makes use of Regex::PreSuf ALSO by the Finnish perl hacker Jarkko Hietaniemi <jhi@iki.fi>. Regex compression is a new way of looking at multiword searching -- instead of iterating over a list -- try using an optimal regex match for the word list. You may be pleasantly surprised by the results.

Just wait a sEcond though:
And now I'll give credit where it's due. Only one
Person is insane enough to come up with such an intentionally conFusing way of doing things...
He is the bringer of obFuscated code, short perl quips, and eye-straining regexes.
Yes that's right, sexeger was coined by, this Perl hacker: ... well you can probably guess who it is by now.

--darksym

Edit by dws to add <readmore> tag

Comment on More sexeger Select or Download Code

Replies are listed 'Best First'.
Re: More sexeger by darksym (Beadle) on Apr 23, 2002 at 18:54 UTC
One possible weakness with my argument above: These perhaps aren't the best demonstration cases. Japhy's cases are much more indepth. When you benchmark the above 3* example, it doesn't always work out more optimal. (See below) With sexegers, I've found there isn't always a 100% clear cut case where it's faster than regexes, simply because there are a lot of variables such as: If you just want a match that doesn't return any results, for instance to indicate with boolean value if the match is true; whether or not you take into account the assignment of the match to a variable - which in the case of sexeger means you have to reverse it back to it's original form at some point; if you have to do real-time reversal to the string your searching _on_ (implicit in my example is the assumption this is done ahead of time, hence mine isn't the best example) - Japhy refers to this as caching; if you do the reversal of the match ouput to it's regular form during or after the match stage - buffering until later nets huge gains, where this is possible. The overhead sometimes nullifies or makes the gains of sexegers negative. So it really does take some benchmarking if there isn't a very obvious difference such as with the case with the 4* examples above (quite obvious upfront.) None of those example sexegers will run faster than 4p1s0 no matter what you do to them. # Pre-reversed input strings, boolean match: Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 9 wallclock secs ( 7.05 usr + 0.00 sys = 7.05 CPU) @ + 14174.97/s (n=100000) sexeger_3p1s0: 6 wallclock secs ( 6.16 usr + 0.00 sys = 6.16 CPU) @ + 16243.65/s (n=100000) Rate regexes_3p0s1 sexeger_3p1s0 regexes_3p0s1 14175/s -- -13% sexeger_3p1s0 16244/s 15% -- # OTF-reversed input strings, non-reversing output: Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 8 wallclock secs ( 7.05 usr + 0.00 sys = 7.05 CPU) @ + 14174.97/s (n=100000) sexeger_3p1s0: 10 wallclock secs ( 7.90 usr + 0.00 sys = 7.90 CPU) @ + 12660.73/s (n=100000) Rate sexeger_3p1s0 regexes_3p0s1 sexeger_3p1s0 12661/s -- -11% regexes_3p0s1 14175/s 12% -- # Pre-reversed input strings, non-reversing ouput: Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 16 wallclock secs (14.16 usr + 0.02 sys = 14.19 CPU) @ + 7048.46/s (n=100000) sexeger_3p1s0: 15 wallclock secs (12.84 usr + 0.00 sys = 12.84 CPU) @ + 7785.89/s (n=100000) Rate regexes_3p0s1 sexeger_3p1s0 regexes_3p0s1 7048/s -- -9% sexeger_3p1s0 7786/s 10% -- # Pre-reversed input strings, reversing ouput Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 18 wallclock secs (14.26 usr + 0.00 sys = 14.26 CPU) @ + 7013.70/s (n=100000) sexeger_3p1s0: 23 wallclock secs (15.14 usr + 0.00 sys = 15.14 CPU) @ + 6604.75/s (n=100000) Rate sexeger_3p1s0 regexes_3p0s1 sexeger_3p1s0 6605/s -- -6% regexes_3p0s1 7014/s 6% -- # Pre-reversed input strings, non-reversing backreferenced output Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 20 wallclock secs (15.84 usr + 0.00 sys = 15.84 CPU) @ + 6311.64/s (n=100000) sexeger_3p1s0: 17 wallclock secs (14.61 usr + 0.00 sys = 14.61 CPU) @ + 6844.92/s (n=100000) Rate regexes_3p0s1 sexeger_3p1s0 regexes_3p0s1 6312/s -- -8% sexeger_3p1s0 6845/s 8% -- # Pre-reversed input strings, reversing backreferenced output: Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 18 wallclock secs (15.79 usr + 0.00 sys = 15.79 CPU) @ + 6333.50/s (n=100000) sexeger_3p1s0: 20 wallclock secs (16.42 usr + 0.00 sys = 16.42 CPU) @ + 6089.44/s (n=100000) Rate sexeger_3p1s0 regexes_3p0s1 sexeger_3p1s0 6089/s -- -4% regexes_3p0s1 6333/s 4% -- [download] #!/usr/bin/perl use Benchmark qw(timethese cmpthese ); my %step3list = ( icate => 'ic', ative => '', alize => 'al', iciti => 'ic', ical => 'ic', ful => '', ness => '', foo => 'bar' ); my @step3list_reg = keys %step3list; my @step3list_sex = map $_=reverse, (keys %step3list); my @step4list =qw( al able ance ant ate ence ent ement er ible ic ism iti ive ize ment ou ous boo ); my @step4list_sex = map $_=reverse, @step4list; sub COUNT (){ 100_000 } #sub COUNT (){ 1 } my $ta = timethese (COUNT, { regexes_4p0s1 => sub { my $match; foreach (@step4list){ #print "$_: "; #if (($match) = /^ if (/^ ( a (?: ble \| n(?:ce\|t) \| te \| l ) \| e (?: ment \| n(?:ce\|t) \| r ) \| i (?: ble \| sm \| ti \| ve \| ze \| c ) \| ment \| ous? ) $/x) { #print "matches expected set using regexes. $match:$`\|$&\|$+\|$ +'\n"; } else { #print "doesn't match expected set using regexes.\n"; } } }, sexeger_4p0s1 => sub { my $match; foreach (@step4list_sex){ #print "$_: "; #if (($match) = /^ if (/^ #4p0s1r: # No performance gains matching in reverse ( (?: elb \| (?:ec\|t)n \| et \| l ) a \| (?: tnem \| (?:ec\|t)n \| r ) e \| (?: elb \| ms \| it \| ev \| ez \| c ) i \| tnem \| s?uo ) $/x) { #$match = reverse $match; #print "matches expected set using sexeger. $match:$`\|$&\|$+\|$ +'\n"; } else { #print "doesn't match expected set using sexeger.\n"; } } }, sexeger_4p1s1 => sub { my $match; foreach (@step4list_sex){ #print "$_: "; #if (($match) = /^ if (/^ #4p1s1r: # However, this has significant performance gains ( (?: e(?:lb\|cn\|t) \| tn \| l ) a \| (?: tnem \| (?:ec\|t)n \| r ) e \| (?: e(?:lb\|[vz]) \| it \| ms \| c ) i \| tnem \| s?uo ) $/x) { #$match = reverse $match; #print "matches expected set using sexeger. $match:$`\|$&\|$+\|$ +'\n"; } else { #print "doesn't match expected set using sexeger.\n"; } } }, }); cmpthese($ta); my $tb = timethese (COUNT, { regexes_3p0s1 => sub { my $match; foreach (@step3list_reg){ #print "$_: "; #if (($match) = /^ if (/^ #3p0s1: # Less optimal #( (?: (?: icat \| ativ \| aliz ) e \| iciti \| (?: ica \| fu ) l \| ness ) $/x) { #$match = $1; #print "matches expected set using regexes. $match:$`\|$&\|$+\|$ +'\n"; } else { #print "doesn't match expected set using regexes.\n"; } } }, sexeger_3p1s0 => sub { my $match; #foreach (@step3list_reg){ foreach (@step3list_sex){ #print "$_: "; #if (reverse=~/^ #if (($match) = /^ if (/^ #3p1s0r: # More optimal - performance gains in reverse #( (?: e (?: taci \| vita \| zila ) \| itici \| l (?: aci \| uf ) \| ssen ) $/x) { #$match = $1; #$match = reverse $1; #$match = reverse $match; #print "matches expected set using sexeger. $match:$`\|$&\|$+\|$ +'\n"; } else { #print "doesn't match expected set using sexeger.\n"; } } } }); cmpthese($tb); __END__ =head1 NAME sextest.pl - Test Sexeger performance =head1 DESCRIPTION An example script to test various sexeger performance attributes. =head1 AUTHOR "darksym" :) =head1 Benchmark Example Ouput Benchmark: timing 100000 iterations of regexes_4p0s1, sexeger_4p0s1, s +exeger_4p1 s1... regexes_4p0s1: 23 wallclock secs (18.66 usr + 0.01 sys = 18.66 CPU) @ + 5357.89/s (n=100000) sexeger_4p0s1: 33 wallclock secs (26.89 usr + 0.03 sys = 26.92 CPU) @ + 3714.45/s (n=100000) sexeger_4p1s1: 29 wallclock secs (24.48 usr + 0.02 sys = 24.49 CPU) @ + 4082.93/s (n=100000) Rate sexeger_4p0s1 sexeger_4p1s1 regexes_4p0s1 sexeger_4p0s1 3714/s -- -9% -31% sexeger_4p1s1 4083/s 10% -- -24% regexes_4p0s1 5358/s 44% 31% -- Benchmark: timing 100000 iterations of regexes_3p0s1, sexeger_3p1s0... regexes_3p0s1: 8 wallclock secs ( 7.10 usr + 0.00 sys = 7.10 CPU) @ + 14081.41/ s (n=100000) sexeger_3p1s0: 6 wallclock secs ( 6.26 usr + -0.01 sys = 6.25 CPU) @ + 16000.00/ s (n=100000) Rate regexes_3p0s1 sexeger_3p1s0 regexes_3p0s1 14081/s -- -12% sexeger_3p1s0 16000/s 14% -- =cut [download]	[reply] [d/l] [select]
Re: More sexeger by darksym (Beadle) on Apr 23, 2002 at 14:21 UTC
Acutally the topic was suppose to be: m/sexeger\|regexes/m but I could fit the \| in. :)	[reply]


No such thing as a small change
	PerlMonks