http://qs1969.pair.com?node_id=161313

OK, I think far too little attention has been devoted to the sexeger(tm) (coined just a while back, it was recently trademarked by a numbered company out of New York in 2001 as applied to computer software -- though I'm not sure who would try to market a product with a name like that! ;) ... Also a patent is pending in the United States -- what's confusing is the application talks about "artistic license" and has a strange demonic looking camel stamp in the applicant space - odd really.)

As is explained on an unnamed saints website and previously brought to the attention of the monks, sexeger are primarily useful as a way to increase the speed of regular expression matches. It is demonstrated that a speed increase of many orders of magnitude is possible through the proper application of sexeger.

However, others may simply find it useful, in cases where obfuscation is desired or they wish to make their code even less maintainable, a sort of perl programming poison pill strategy. It's easy to demonstrate your mastery by listing your code backward, but not _exactly_ backward. See in sexeger speak, regular expressions preserve their grouping and other logical elements while reversing the strings in each element and where appropriate ordering them likewise in reverse. An inappropriate reversal is the character class. Due to their commutative-like properties, there is no need to reverse these classes. A better obfuscation strategy would be to take reoccuring classes and randomly permute their elements so as to make them harder to decipher at first glance and/or break out into sub-expressions.

Now as explained, there are many cases where sexegers are apropos and helpful, however it has also come to my attention that there may similarly be cases, where employing them would benefit the cause of obfuscation more so than any performance increase:
#4p1s0: # Already optimal matching forward (?: a (?: ble | n(?:ce|t) | te | l ) | e (?: ment | n(?:ce|t) | r ) | i (?: ble | sm | ti | ve | ze | c ) | ment | ous? ) #4p0s1r: # No performance gains matching in reverse (?: (?: elb | (?:ec|t)n | et | l ) a | (?: tnem | (?:ec|t)n | r ) e | (?: elb | ms | it | ev | ez | c ) i | tnem | s?uo ) #4p1s0r: # Likewise this is not an optimal match, although: (?: ci | e (?: cn[ae] | lb[ai] | ta | vi | zi ) | iti | la | msi | re | suo | tn (?: eme? | [ae] ) | uo ) #4p0s1: # It's better than this: (?:ic|(?:[ae]nc|[ai]bl|at|iv|iz)e|iti|al|ism|er|ous|(?:e?me|[ae])nt +|ou)

One can analyse the above matches like follows:
It can be said that each element in a match group with sub groups is a root or lowest common factor of the group. Thus the match knows if it finds this common root of the sub-group it has also found one of the match elements of the sub-group -- and if not, it can discard these from the solution set immediately. The less the amount of such factors at the base level of a match group, usually the better the performance. An equation might summarise the total time required to perform a match. And I'll leave this as an open challenge, as I haven't perfected this to a science yet. So far, I've just been doing trial and error benchmarking using the well-known Benchmark by Jarkko Hietaniemi and using my common sense knowledge (I'm no master regexer) of the regex engine.

Also consider:
#3p0s1: # Less optimal (?: (?: icat | ativ | aliz ) e | iciti | (?: ica | fu ) l | ness ) #3p1s0r: # More optimal - performance gains in reverse (?: e (?: taci | vita | zila ) | itici | l (?: aci | uf ) | ssen )

In the above it is obvious that the latter is a good sexeger due to it's suffix heavy commonality. Remember that in a sexeger a prefix becomes a suffix and a suffix a prefix.


So far there is no way to automatically generate all sexeger given the regexes wishing to be transformed. However on the site above, the unnamed saint, has indicated work is in progress on just such a tool. Until then for complex regexes, hand reversal can prove to be both instructional and fun. Usually it takes a very small amount of time to do, once you can get over the initial disorientation and accidental typing of (:?) instead of (?:). My strategy is to do like follows now:
(?:abcd|efg|won|I|[now]|know|my|regexes) (?:abcdcba|efgfe|wonow|[now]|I|knowonk|mym|regexesexeger) (?x-ism: dcba|gfe | now |I| [own] |wonk|y m|sexeger)


Yes indeed, it is all very fun. I could do this for hours on end, and make a day out of it. Joy! ... umm.. But I just had to cheat. :)

As the above code blurbs might have you gather, I have been using a helper script to create these different forms of the same search list. As a new way to create word search regexes employing segexer, I'll show expressions like these can be automatically generated like so:

from helper.pl:
#!/usr/bin/perl use Regex::PreSuf; my @step4list =qw( al able ance ant ate ence ent ement er ible ic ism iti ive ize ment ou ous ); grep $_=reverse, @step4list; print "4p1s0r:", presuf ({suffixes=>0}, @step4list), "\n"; print "4p1s1r:", presuf ({suffixes=>1}, @step4list), "\n"; print "4p0s1r:", presuf ({prefixes=>0}, @step4list), "\n";

This makes use of Regex::PreSuf ALSO by the Finnish perl hacker Jarkko Hietaniemi <jhi@iki.fi>. Regex compression is a new way of looking at multiword searching -- instead of iterating over a list -- try using an optimal regex match for the word list. You may be pleasantly surprised by the results.

Just wait a sEcond though:
And now I'll give credit where it's due. Only one
Person is insane enough to come up with such an intentionally conFusing way of doing things...
He is the bringer of obFuscated code, short perl quips, and eye-straining regexes.
Yes that's right, sexeger was coined by, this Perl hacker: ... well you can probably guess who it is by now.

--darksym

Edit by dws to add <readmore> tag