in reply to Reverse regexp a regexp?

freakingwildchild:

In general, a regular expression can match an infinite number of different strings, so you'll have to figure out how to constrain the set to something manageable. As a degenerate case, consider the regex /.*/: what are you planning on going backwards to? How do you make the choices? Once you can answer those questions, you'll be able to make some progress towards what you want.

...roboticus

Replies are listed 'Best First'.
Re^2: Reverse regexp a regexp?
by Anonymous Monk on Feb 12, 2010 at 20:27 UTC
    The other degenerate case is /\Qstring1\E|\Qstring2\E|...|\QstringN\E/.
Re^2: Reverse regexp a regexp?
by freakingwildchild (Scribe) on Feb 13, 2010 at 23:19 UTC
    There are a few examples given in my further posts; most URL structures are SEO friendly, offering an easy way to match differences between them.
    Those subtle differences don't need big code to detect; rather strict rules before the routine is satisfied with it's own results and creativity/knowledge where to start.

    I've checked a few approaches including but not limited to heuristics, fuzzy matching, Regexp::Assemble etc.. but found no real good results yet.

    Currently my best approach is keeping a hash with known sites and their structure, but like to get that hash automatic instead of by hand because the Internet is rather big ;)