Here’s my take on the requirement (and, admittedly, it’s only a guess): Identify sentences containing the target string 23456 and delete them, leaving the other sentences — together with their surrounding markup — unchanged.
I came up with this algorithm:
The trickiest part of this is identifying sentences. Here’s my solution:
#! perl use strict; use warnings; my $original = do { local $/; <DATA>; }; my $string = $original =~ s{ < /? (?: p | br) > }{}grx; my @sentences = $string =~ m{ ( .*? (?: [.?!] \s+ | \z) ) }gsx; chomp @sentences; $original =~ s{$_}{} for grep { /23456/ } @sentences; $original =~ s{ +}{ }g; print $original; __DATA__ <p>Find more business news at facebook.com and twitter.com. Text BUSIN +ESS to 23456 for breaking business news text alerts on your mobile phone. Tex +t JOBS to 23456 for job alerts.</p> <p>Want your news even faster? Text NEWS to 23456 to sign up for break +ing news text alerts. See for a complete list of alerts.</p> <br>SIGN UP FOR MOBILE NEWS ALERTS! Get your news on the go, text NEWS + to 23456
Output:
16:49 >perl 1326_SoPW.pl <p>Find more business news at facebook.com and twitter.com. </p> <p>Want your news even faster? See for a complete list of alerts.</p> <br>SIGN UP FOR MOBILE NEWS ALERTS! 16:49 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
In reply to Re: regex help
by Athanasius
in thread regex help
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |