in reply to Re^2: Multidimensional regular expressions
in thread Multidimensional regular expressions
I have sometimes thought about generic regular expressions that could work on any sequence, not just a sequence of characters (useful for writing correlation rules, etc.). Today it also struck me that it should be possible to apply the same concepts to more than one dimension. I started to address it as a toy project, began thinking through what it would really mean, and decided it was getting a bit deep, to say the least. So I thought I might use google to find what else is out there. This little short discussion was the top hit. And though I'm a perl fan, I wasn't even thinking about or looking for anything related to perl.
Syntax is a completely separate problem to solve. You can address the multi-dim regex problem by thinking in terms of the fundamental components of a regular expression and how it functions with those components. So, for example, you have stuff like these, which could be generalized to no longer involve characters:
- Atoms (take up positions in the sequence, make up the lowest-level of rudimentary comparison -- recursion stops here)
- Groups (both capturing and non-capturing, the point is to allow all the regex-matching features to be divided up into a heirarchy of pieces)
- Quantifiers (applied to atoms or groups -- and these are usually as simple as a range from minimum-required to maximum-needed)
- Assertions (typically are considered as tied to positions between atoms, or on the ends of a sequence, and the assertions do not advance position during processing except for their own temporary, internal matching evaluation)
So you would still have all of that in generalized regular expressions, and in multi-dimensional ones as well.
The dimensionality aspect really does have to do with the directionality of advancement of the matching position within the target sequence. I had thought of this independently before coming across this very old posting, and I think it really is a key insight. For any particular search space (regardless of the type of the atoms), you could build a regular expression that includes a new type of component, which is a direction change. It serves as both an assertion (that it is possible at that position to move in the indicated direction), and as a control for the matching engine to advance position differently.
Direction changes need not limited to right-angles in an array (of any dimensionality). Instead, a direction change could apply to any possible directed relationship among atoms. Consider a tree structure, for example. What direction is the default direction? Perhaps there isn't one. Maybe a "direction" in this case is really an entire set of possible relationships, each of which might need to be tested, such as moving to a child node. Some of that sort of multi-dimensional relationship path has already been addressed in the world of XPath expressions, which fundamentally deal with a tree structure.
I should be able to have a search space that defines the fundamental operations: comparing two atoms, evaluating an assertion at a given position, enumerating all possible directions of a given type at a given position, and advancing in some specific direction at a given position (remember that the possibility should be tested first as an assertion, but with multiple possibilities, one need only have a non-empty set). All of this being defined, the regular expression could be built piece by piece using the same structure we use today for character strings, using function calls on a builder object if no syntax is invented to make it simpler. The point of all this is that, once you've built all that, the regular expression matching engine should still work pretty much the exact same way that string-based regex matching does. It's all the same thing.
This is what I think "multi-dimensional regular expressions" are. Now, if anybody is still out there seeing this ancient thread, does it exist anywhere? :-)