John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

I could write a regex using the specification at w3.org, but it seems to me that something like that should already exist in the XML:: modules somewhere.

Could someone tell me where that function may be hiding?

—John

Replies are listed 'Best First'.
Re: Is some string a legal XML Name?
by mirod (Canon) on Jun 07, 2001 at 09:11 UTC

    You will find a regexp named Name in XML::Regexp in libxml-enno.

    You can use it this way:

    $name =~ /^$XML::RegExp::Name$/o
      It just plain doesn't match anything! Looking at the module, I think it's a UTF-8 issue. E.g.
      $Ideographic = '(?:\xE3\x80[\x87\xA1-\xA9]|\xE4(?:[\xB8-\xBF][\x80-\xB +F])|\xE5(?:[\x80-\xBF][\x80-\xBF])|\xE6(?:[\x80-\xBF][\x80-\xBF])|\xE +7(?:[\x80-\xBF][\x80-\xBF])|\xE8(?:[\x80-\xBF][\x80-\xBF])|\xE9(?:[\x +80-\xBD][\x80-\xBF]|\xBE[\x80-\xA5]))';
      That is, \xE3 followed by \x80 \x87 are individual bytes in a UTF-8 encoded string, which won't match a real \x{2007} in the string.

      However, my test data are normal ASCII range characters, and that doesn't succeed either, though it starts $BaseChar = '(?:[a-zA-Z]|\xC3[\x80-\x9.... So I don't know everything that's wrong with it, but it doesn't work at all (see below) if utf8 is used.

      use strict; use warnings; use utf8; # comment this line out and it matches use XML::RegExp; my $name= 'timestamp'; # contains plain ASCII letters only! my $result= $name =~ /^$XML::RegExp::Name$/o; print "result is $result\n";

        This looks weird.

        XML::RegExp was released way before 5.6.0 though, so it is not that surprising that it does not deal that well with the utf8 pragma. Do you really need it around if you only deal with US ASCII characters?

        Incidently the definition of Name includes regular characters:

        $Letter = "(?:$BaseChar|$Ideographic)"; $NameChar = "(?:[-._:]|$Letter|$Digit|$CombiningChar|$Extender)"; $Name = "(?:(?:[:_]|$Letter)$NameChar*)";
Re: Is some string a legal XML Name?
by Vynce (Friar) on Jun 07, 2001 at 02:38 UTC

    what do you mean by a legal XML name? a legal name for an XML element? or an Attribute? or a document? or a DTD? or for XML itself? or for namespaces? or...

    ahem, sorry, got carrried away there.

    but could you clarify?

      To clarify, follow the HREF in the posting and it points you right to the grammar item in question. The 'name' nonterminal is used for both element and attribute names.