As far as I understand, the ability to backreference in Perl's regular expressions makes this problem difficult at best. NDFA's (Non-Deterministic Finite Automata) can be represented as a set of points with certain lines connecting them. (Note: if you don't understand that, then I'm not going to explain it. Its not that I can't... its that I don't really have time right now. Its fairly complicated.) I read an article recently (can't remember where) describing Perl's regex engine as constructing NDFA's and working through them. The article described backreferencing as placing little recorders in certain places with record, stop and play buttons scattered about. I think this breaks the NDFA analogy and makes the rest of your argument null and void.
Allright... I was about to start going into a flashback of my programming theory courses (I started thinking about Context-Free Grammars, Turing Machines, regular languages and NP-complete. it was ugly.) but I'll spare everyone that. In short... I think that if this was possible, and reasonably easy, someone would have done it by now. But if you think you can create something that'll do a reasonably good job, go for it and let me know how it goes.
jeff
Update: Thank you
BlaisePascal for confirming my thoughts (see his point #2).