in reply to Re: Is some string a legal XML Name?
in thread Is some string a legal XML Name?

It just plain doesn't match anything! Looking at the module, I think it's a UTF-8 issue. E.g.
$Ideographic = '(?:\xE3\x80[\x87\xA1-\xA9]|\xE4(?:[\xB8-\xBF][\x80-\xB +F])|\xE5(?:[\x80-\xBF][\x80-\xBF])|\xE6(?:[\x80-\xBF][\x80-\xBF])|\xE +7(?:[\x80-\xBF][\x80-\xBF])|\xE8(?:[\x80-\xBF][\x80-\xBF])|\xE9(?:[\x +80-\xBD][\x80-\xBF]|\xBE[\x80-\xA5]))';
That is, \xE3 followed by \x80 \x87 are individual bytes in a UTF-8 encoded string, which won't match a real \x{2007} in the string.

However, my test data are normal ASCII range characters, and that doesn't succeed either, though it starts $BaseChar = '(?:[a-zA-Z]|\xC3[\x80-\x9.... So I don't know everything that's wrong with it, but it doesn't work at all (see below) if utf8 is used.

use strict; use warnings; use utf8; # comment this line out and it matches use XML::RegExp; my $name= 'timestamp'; # contains plain ASCII letters only! my $result= $name =~ /^$XML::RegExp::Name$/o; print "result is $result\n";

Replies are listed 'Best First'.
Re: XML::RegExp doesn't work (Re: Is some string a legal XML Name?)
by mirod (Canon) on Jun 08, 2001 at 20:57 UTC

    This looks weird.

    XML::RegExp was released way before 5.6.0 though, so it is not that surprising that it does not deal that well with the utf8 pragma. Do you really need it around if you only deal with US ASCII characters?

    Incidently the definition of Name includes regular characters:

    $Letter = "(?:$BaseChar|$Ideographic)"; $NameChar = "(?:[-._:]|$Letter|$Digit|$CombiningChar|$Extender)"; $Name = "(?:(?:[:_]|$Letter)$NameChar*)";
      I know the $BaseChar includes regular characters. So the byte/char mode thing doesn't explain why it doesn't work at all, when the name to match is pure ASCII.

      I'm not dealing only with US ASCII. I'm reading UTF-8. This example was a simple case showing it fail.

      —John