in reply to Regex capture between word and punctuation

negzero7,

Since you are new to the Monastery, please read How do I compose an effective node title?. Thus far, all of your nodes have had meaningless titles.

Now to your question...

I believe this code satisfies your conditions of:

#!/usr/bin/env perl use warnings; use strict; open my $test_fh, '<', @ARGV or die "Can not open file $!\n"; while (<$test_fh>) { if (/\b[Ii]t\b(.*)[.,?]$/) { print "$1\n"; } } close $test_fh;

Update: driver8 astutely points out that my solution incorrectly excludes the ending punctuation. To capture the punctuation, as the OP desires, change the regex to:

if (/\b[Ii]t\b(.*[.,?])$/) {

Replies are listed 'Best First'.
Re^2: Regex capture between word and punctuation
by negzero7 (Sexton) on Mar 18, 2008 at 17:18 UTC
    Hey nicely composed toolic. I apologize for the lack of decent thread posts, but thanks to everyone for posting anyway.

    I have a few questions about the code you posted:

    What does '<' mean/do in the open line?

    Any chance you can explain the pieces in your regular expression? I can follow most of it but some elements are new to me I think.

      What does '<' mean/do in the open line?
      Rather than just giving you the answer, perhaps it would be more useful to you if I showed you some ways of finding the answer. There is free documentation both online at open and at your *nix command prompt:
      perldoc -f open
      Read about the 3-argument form of open, specifically the MODE.
      Any chance you can explain the pieces in your regular expression?
      Browse through perlretut. YAPE::Regex::Explain can explain it better than I can:
      The regular expression: (?-imsx:\b[Ii]t\b(.*)[.,?]$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- [Ii] any character of: 'I', 'i' ---------------------------------------------------------------------- t 't' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- [.,?] any character of: '.', ',', '?' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      Here is how I got that explanation

      #!/usr/bin/env perl use warnings; use strict; use YAPE::Regex::Explain; my $re = '\b[Ii]t\b(.*)[.,?]$'; my $parser = YAPE::Regex::Explain->new($re); print $parser->explain;
        wow, I had no idea perl could do that. I have so much to learn! Thanks for taking the time to show me that.