Re: Regex capture between word and punctuation

negzero7,

Since you are new to the Monastery, please read How do I compose an effective node title?. Thus far, all of your nodes have had meaningless titles.

Now to your question...

I believe this code satisfies your conditions of:

Line ending in period, comma, or question mark.
Line containing case-insensitive word "it"

#!/usr/bin/env perl
use warnings;
use strict;

open my $test_fh, '<', @ARGV or die "Can not open file $!\n";
while (<$test_fh>) { 
    if (/\b[Ii]t\b(.*)[.,?]$/) {
        print "$1\n";
    }
}
close $test_fh;
[download]

Update: driver8 astutely points out that my solution incorrectly excludes the ending punctuation. To capture the punctuation, as the OP desires, change the regex to:

    if (/\b[Ii]t\b(.*[.,?])$/) {
[download]

Comment on Re: Regex capture between word and punctuation Select or Download Code

Replies are listed 'Best First'.
Re^2: Regex capture between word and punctuation by negzero7 (Sexton) on Mar 18, 2008 at 17:18 UTC
Hey nicely composed toolic. I apologize for the lack of decent thread posts, but thanks to everyone for posting anyway. I have a few questions about the code you posted: What does '<' mean/do in the open line? Any chance you can explain the pieces in your regular expression? I can follow most of it but some elements are new to me I think.	[reply]
Re^3: Regex capture between word and punctuation by toolic (Bishop) on Mar 18, 2008 at 17:41 UTC
What does '<' mean/do in the open line? Rather than just giving you the answer, perhaps it would be more useful to you if I showed you some ways of finding the answer. There is free documentation both online at open and at your nix command prompt: `perldoc -f open` [download] Read about the 3-argument form of open, specifically the MODE. Any chance you can explain the pieces in your regular expression?* Browse through perlretut. YAPE::Regex::Explain can explain it better than I can: The regular expression: (?-imsx:\b[Ii]t\b(.)[.,?]$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- [Ii] any character of: 'I', 'i' ---------------------------------------------------------------------- t 't' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- . any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- [.,?] any character of: '.', ',', '?' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] Here is how I got that explanation `#!/usr/bin/env perl use warnings; use strict; use YAPE::Regex::Explain; my $re = '\b[Ii]t\b(.*)[.,?]$'; my $parser = YAPE::Regex::Explain->new($re); print $parser->explain;` [download]	[reply] [d/l] [select]
Re^4: Regex capture between word and punctuation by negzero7 (Sexton) on Mar 18, 2008 at 18:04 UTC
wow, I had no idea perl could do that. I have so much to learn! Thanks for taking the time to show me that.	[reply]