Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello.
I have to deal with large amounts of multilingual data in XML format that looks something like this:
<FILE> <FOO xml:lang="en"> <BAR/> </FOO> <FOO xml:lang="ru"> <BAR/> </FOO>
Now at any given time, the user of the CGI script needs to see the data in only one of the languages, which he selects (say from a form). Within the code, I have a variable that stores the language, i.e.
my $lang = "en";
When I parse the XML (I'm using XML::Parser), I need only the blocks that are in the language (i.e. have the xml:lang attribute) specified by $lang. How do I do that?
So far, I haven't been able to extract the xml:lang attribute using XML::Parser. It seems to ignore attributes with the xml: prefix.
Thanks for your help!

Replies are listed 'Best First'.
Re: Parsing xml:lang attribute
by bart (Canon) on Jul 25, 2010 at 23:00 UTC
    So far, I haven't been able to extract the xml:lang attribute using XML::Parser. It seems to ignore attributes with the xml: prefix.
    My experience doesn't match yours. I can clearly see the attribute 'xml:lang' in the list of attributes.
    use XML::Parser; use Data::Dumper; my $parser = XML::Parser->new(Handlers => { Start => sub { print Dumper [ @_[1 .. $#_] ] } } ); $parser->parse(\*DATA); __DATA__ <FILE> <FOO xml:lang="en"> <BAR/> </FOO> <FOO xml:lang="ru"> <BAR/> </FOO> </FILE>
    Result:
    $VAR1 = [ 'FILE' ]; $VAR1 = [ 'FOO', 'xml:lang', 'en' ]; $VAR1 = [ 'BAR' ]; $VAR1 = [ 'FOO', 'xml:lang', 'ru' ]; $VAR1 = [ 'BAR' ];
    Anyway: you might be interested in using a parser that can handle XPath, as it might make filtering out just the contents you want, just a little bit easier.

    Although personally, I think XML::Parser is fine... :)

Re: Parsing xml:lang attribute
by happy.barney (Friar) on Jul 26, 2010 at 07:16 UTC
    try to use XSLT;
    use strict; use warnings; use XML::LibXSLT; use XML::LibXML; my $LANG = 'en'; my $xslt = XML::LibXSLT->new(); my $source = XML::LibXML->load_xml (location => shift); my $style_doc = XML::LibXML->load_xml (string => do { local $/; <DATA> +}); my $stylesheet = $xslt->parse_stylesheet ($style_doc); my $results = $stylesheet->transform ($source, lang => "'$LANG'"); print $stylesheet->output_as_bytes ($results); $LANG = 'ru'; my $results = $stylesheet->transform ($source, lang => "'$LANG'"); print $stylesheet->output_as_bytes ($results); __DATA__ <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml"/> <xsl:param name="lang"/> <xsl:template match="@* | node()"> <xsl:copy> <xsl:apply-templates select="@* | node()[not(@xml:lang)] | node( +)[@xml:lang = $lang]"> </xsl:apply-templates> </xsl:copy> </xsl:template> </xsl:stylesheet>
Re: Parsing xml:lang attribute
by intel (Beadle) on Jul 25, 2010 at 22:56 UTC
    Well, I'm sure there's an elegant way to do it with the Module, which you should look into and definitely use, but if you just want to get the language value in a $lang variable, there's a lot of ways to do that:
    open R, "file" or die "could not open file\n"; while (<R>) { my @lines = split (/\n/); foreach my $line (@lines){ next if $line !~ /xml:lang/xg; my ($junk, $lang) = split /=/, $line; $lang =~ s/^"(.*)">$/$1/; print "$lang \n"; } }
    It's not the best way to solve the problem, and it's definitely not the shortest way to write it, but it works and makes sense (to me anyway)