in reply to Re: Regex capture between word and punctuation
in thread Regex capture between word and punctuation

Hey nicely composed toolic. I apologize for the lack of decent thread posts, but thanks to everyone for posting anyway.

I have a few questions about the code you posted:

What does '<' mean/do in the open line?

Any chance you can explain the pieces in your regular expression? I can follow most of it but some elements are new to me I think.

  • Comment on Re^2: Regex capture between word and punctuation

Replies are listed 'Best First'.
Re^3: Regex capture between word and punctuation
by toolic (Bishop) on Mar 18, 2008 at 17:41 UTC
    What does '<' mean/do in the open line?
    Rather than just giving you the answer, perhaps it would be more useful to you if I showed you some ways of finding the answer. There is free documentation both online at open and at your *nix command prompt:
    perldoc -f open
    Read about the 3-argument form of open, specifically the MODE.
    Any chance you can explain the pieces in your regular expression?
    Browse through perlretut. YAPE::Regex::Explain can explain it better than I can:
    The regular expression: (?-imsx:\b[Ii]t\b(.*)[.,?]$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- [Ii] any character of: 'I', 'i' ---------------------------------------------------------------------- t 't' ---------------------------------------------------------------------- \b the boundary between a word char (\w) and something that is not a word char ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- [.,?] any character of: '.', ',', '?' ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    Here is how I got that explanation

    #!/usr/bin/env perl use warnings; use strict; use YAPE::Regex::Explain; my $re = '\b[Ii]t\b(.*)[.,?]$'; my $parser = YAPE::Regex::Explain->new($re); print $parser->explain;
      wow, I had no idea perl could do that. I have so much to learn! Thanks for taking the time to show me that.