in reply to Help for regex

So it's impossible that a newline, whitespace, commas (or other significant delimiters), quotes, escape sequences, or other tags could be embedded in the ID? That being the case this seems simple enough:

if( $string =~ m/<ID>([^<]+)<\/ID>/ ) { print "$1\n"; }

It gets a lot more complicated if the input turns out to be more complex.

If you haven't done so already, please spend an hour with perlretut. After that you'll wonder why you needed to ask.

Update: Added a backslash. ;)


Dave

Replies are listed 'Best First'.
Re^2: Help for regex
by Anonymous Monk on Apr 01, 2012 at 05:37 UTC

    Can you please explain "(^<+)"?

      Certainly. [^...] is a negated character class. If [...] allows you to enumerate what characters WILL match at a given position, [^...] allows you to say 'match any character except for these characters, at this position'.

      Negated character classes are discussed in perlretut under the heading Using character classes.

      + is a quantifier. Quantifiers are discussed in perlretut. It says to match one or more characters that meet the criteria of the preceding character class. And the (...) are capturing parenthesis. Capturing parens are discussed in perlretut. They say to capture whatever happens to match the pattern within. Since this is the first capture, it will be placed in $1

      Putting it all together: Match anything that is not '<', as many characters as possible, and capture them into $1. $1 and other capture variables are discussed in perlretut.

      Now would be a good time to follow my suggestion to read perlretut. ...you are looking to learn about regexes right? It should take about an hour or two to get the basics.


      Dave

      The delimiters matter, so
      use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr{<ID>([^<]+)</ID>} )->explain; __END__ The regular expression: (?-imsx:<ID>([^<]+)</ID>) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- <ID> '<ID>' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^<]+ any character except: '<' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- </ID> '</ID>' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------