Re: XML::Parser - Usage of &

XML::Parser shouldn't be ignoring the Company A&; I think what you'll find is that it treats the title as three pieces of character data:

Company A
&
B Information

And it will treat these as three separate parse events. Quick demonstration:

use 5.010;
use strict;
use warnings;

use XML::Parser;

my $in_title;
my $parser = XML::Parser->new(
    Handlers => {
        Start => sub { $in_title++ if $_[1] eq 'Title' },
        End   => sub { $in_title-- if $_[1] eq 'Title' },
        Char  => sub { say "CHAR: $_[1]" if $in_title },
    },
);

$parser->parse(<<'XML');
<Document>
    <Title>Company A&amp;B Information</Title>
    <Abstract>Foo</Abstract>
</Document>
XML
[download]

XML::Parser is very bare-bones, and sees the job of translating those parse events into a useful data structure as being very much your job.

Personally I prefer DOM-based XML parsers, such as XML::LibXML which parse the entire file into a tree and allow you to manipulate and navigate that tree using the same DOM interface which web browsers expose to Javascript.

package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Comment on Re: XML::Parser - Usage of & Select or Download Code

Replies are listed 'Best First'.
Re^2: XML::Parser - Usage of & by sumeetgrover (Monk) on Feb 20, 2013 at 11:19 UTC
You are right, the parser is indeed treating the title as: `1. Company A 2. & 3. B Information` [download] Therefore, does it mean that our code needs to have the ability to put all these three pieces together and save as one single title? Many thanks for your help!	[reply] [d/l]
Re^3: XML::Parser - Usage of & by tobyink (Canon) on Feb 20, 2013 at 12:35 UTC
I'm guessing that right now the code (you haven't posted any, so the best I can do is guess!) in the Char handler is saving a reference to the last bit of character data, and then when the End handler sees the end of the Title element, it does something with that. Maybe something like this: `use 5.010; use strict; use warnings; use XML::Parser; my ($got_title, $in_title); my $parser = XML::Parser->new( Handlers => { Start => sub { $in_title++ if $_[1] eq 'Title' }, End => sub { $in_title--, say "GOT TITLE: $got_title" if $_[ +1] eq 'Title' }, Char => sub { $got_title = $_[1] if $in_title }, }, ); $parser->parse(<<'XML'); <Document> <Title>Company A&B Information</Title> <Abstract>Foo</Abstract> <Title>Company X&Y Information</Title> <Abstract>Bar</Abstract> </Document> XML` [download] Instead you want the Char handler to accumulate the pieces of character data using either string appending, or pushing onto an array/arrayref, then use the Start and End handlers to signal when to start and end accumulating character data. For example: use 5.010; use strict; use warnings; use XML::Parser; my (@got_title, $in_title); my $parser = XML::Parser->new( Handlers => { Start => sub { $in_title++, @got_title = () if $_[1] eq 'Title +' }, End => sub { $in_title--, say "GOT TITLE: @got_title" if $_[ +1] eq 'Title'; }, Char => sub { push @got_title, $_[1] if $in_title }, }, ); $parser->parse(<<'XML'); <Document> <Title>Company A&B Information</Title> <Abstract>Foo</Abstract> <Title>Company X&Y Information</Title> <Abstract>Bar</Abstract> </Document> XML [download] `package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name`	[reply] [d/l] [select]
Re^4: XML::Parser - Usage of & by sumeetgrover (Monk) on Feb 20, 2013 at 14:53 UTC
Thank you! This is exactly the type of bug fix I will be implementing in my code. All makes sense now.	[reply]
Re^3: XML::Parser - Usage of & by runrig (Abbot) on Feb 20, 2013 at 19:57 UTC
You probably should be using some higher level parser than XML::Parser. Maybe XML::Rules or XML::Twig. Or possibly XML::Simple or XML::LibXML.	[reply]