in reply to Re^2: Using Perl to snip the end off of HTML
in thread Using Perl to snip the end off of HTML
Pay more attention to hv's critique (++!) than to my ugly code... and apologies for not writing the regexen in extended format with explanations in comments (If you need same I may be able to produce, but not quickly. workload heated up bigtime yesterday). But, back to hv's wisdom vs. mine: I'm (maybe) halfway decent at translating structure such as you showed into a minimally working regex, but I'm already busy trying to internalize his advice, which appears to be very good.
That said, I am not entirely sure all his alternatives are applicable to your project: note that we appear to differ in our understandings of your intent.
For example, if you wish to lose the blockquote tags, but not the editorial content they surround, as I read it, then the regexen MUST (TTBOMK) work on a string because not only the tags, but also the emails' 'editorial contents' span multiple '\n'. Even if so, though, I think his method of building the string is a large improvement over mine.
re parser: believe prior comments mentioned several, which may or may not facilitiate your work.
|
|---|