argos has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to pull out the url value found in the REDIRECTURL element of this XML Here is what works so far. This seems to work but maybe there is a better way?

#!/usr/bin/perl use warnings; use strict; my $data; $data = <<XML; <PHC_LOGIN>\r <VERSION>3.0</VERSION>\r <PARTNERID>8747</PARTNERID>\r <USERLOGIN>john.doe.</USERLOGIN>\r <ERROR_CODE>0</ERROR_CODE>\r <ERROR_DESCRIPTION>Login Successful</ERROR_DESCRIPTION>\r <SESSIONID>c12168e0-e34e-4c27-b4a6-be235ff7fa41</SESSIONID>\r <REDIRECTURL>https://example.com/_bers/log.asp?SID=c12168e0-e34e-4c2 +7-b4a6-be235ff7fa41</REDIRECTURL>\r </PHC_LOGIN> XML if ($data =~ /(https:\/\/[^\s]+[$<])/ ) { print "Found: $1\n";

Replies are listed 'Best First'.
Re: regex pattern matches
by Anonymous Monk on Aug 22, 2014 at 12:29 UTC

    Use XML parser module; one among some is XML::LibXML.

    If you insist, update your regex to account for the value between start & end tags ...

    use v5.16.0; my $tag = 'tag'; my $string = '... <tag>value</tag>'; my $parse = qr{<$tag>(?<val>.*?)</$tag>}s; $string =~ $parse and say 'found: ', $+{'val'}
      "Use an XML parser" Is the correct answer. Which one is up to you. I personally recommend mirod's XML::Twig.
Re: regex pattern matches
by vinoth.ree (Monsignor) on Aug 22, 2014 at 12:16 UTC
    Hi,

    You are not grouping the REDIRECTURL element value in if statement, instead grouping only the REDIRECTURL tag. So $1 will have the value as REDIRECTURL U need to change the if statement as

    if ($data =~ m|<REDIRECTURL>(.*)</REDIRECTURL>| ) { print "Found: $1\n"; }

    Update:

    As suggested by johngg modified the if statement.


    All is well
      The angle brackets ('<' & '>') are not regex metacharacters so need not be escaped. Choosing a different pattern delimiter would mean that nothing needs to be, making the pattern a little easier on the eye.

      $ perl -E ' $text = q{asd<TAG>dsyuhf</TAG>urrh}; $item = $1 if $text =~ m{<TAG>([^<]+)</TAG>}; say $item;' dsyuhf $

      Note also that I capture ([^<]+) because (.*) is greedy.

      $ perl -E ' $text = q{asd<TAG>dsyuhf</TAG>urrh<TAG>dsvnnhf</TAG>urubb}; $item = $1 if $text =~ m{<TAG>(.*)</TAG>}; say $item;' dsyuhf</TAG>urrh<TAG>dsvnnhf $

      I hope this is useful.

      Cheers,

      JohnGG

      johngg has already mentioned the possible issue, but it seems you did not notice. I would suggest that it would be safer to use a non-greedy quantifier:
      if ($data =~ m|<REDIRECTURL>(.*?)</REDIRECTURL>| ) { print "Found: $1\n"; }
      just in case there are two <REDIRECTURL>...</REDIRECTURL> pairs of tags (assuming this is possible).

      I see your logic, that looks better than what I came up with as it clearly defines the start and end pattern match. I really want whatever is found between those two points which you capture with parenthesis. Thanks!

Re: regex pattern matches
by roboticus (Chancellor) on Aug 22, 2014 at 12:19 UTC

    argos:

    That's nice. Did you have a question?

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.