in reply to regex help
Here’s my take on the requirement (and, admittedly, it’s only a guess): Identify sentences containing the target string 23456 and delete them, leaving the other sentences — together with their surrounding markup — unchanged.
I came up with this algorithm:
The trickiest part of this is identifying sentences. Here’s my solution:
#! perl use strict; use warnings; my $original = do { local $/; <DATA>; }; my $string = $original =~ s{ < /? (?: p | br) > }{}grx; my @sentences = $string =~ m{ ( .*? (?: [.?!] \s+ | \z) ) }gsx; chomp @sentences; $original =~ s{$_}{} for grep { /23456/ } @sentences; $original =~ s{ +}{ }g; print $original; __DATA__ <p>Find more business news at facebook.com and twitter.com. Text BUSIN +ESS to 23456 for breaking business news text alerts on your mobile phone. Tex +t JOBS to 23456 for job alerts.</p> <p>Want your news even faster? Text NEWS to 23456 to sign up for break +ing news text alerts. See for a complete list of alerts.</p> <br>SIGN UP FOR MOBILE NEWS ALERTS! Get your news on the go, text NEWS + to 23456
Output:
16:49 >perl 1326_SoPW.pl <p>Find more business news at facebook.com and twitter.com. </p> <p>Want your news even faster? See for a complete list of alerts.</p> <br>SIGN UP FOR MOBILE NEWS ALERTS! 16:49 >
Hope that helps,
| Athanasius <°(((>< contra mundum | Iustus alius egestas vitae, eros Piratica, |
|
|---|