in reply to XML::Parser problems

I would think that the expat string which I get (XML::Parser::Expat=HASH(0x819a53c)) might tell that, but I didn't see any reference to it in perldoc.
String? That's the default string representation of an object (my $obj = XML::Parser::Expat->new(...); print "$obj";). What you're looking for is the XML::Parser::Expat methods current_line, current_column, and current_byte.

MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
** The third rule of perl club is a statement of fact: pod is sexy.

Replies are listed 'Best First'.
Re^2: XML::Parser problems
by Hena (Friar) on Jul 01, 2005 at 07:08 UTC
    I was quessing that it would've been a hash. Well not enough experience with perls objects :).

    But there were the functions somewhere. Thanks. Well this gets more complicated. My text function is now like this.
    sub text (@) { # shift @_; if ($text && $_[1]=~/\S/) { # $UNIQ{$com}{$_[1]}++; # $i++; if ($str) { print XML::Parser::Expat::current_line($_[0]),",",XML::Parser::E +xpat::current_column($_[0]),"\n"; print "'$str','$_[1]'\n$com\n";exit; } $str .= $_[1]; } }
    And the result of the run is this.
    $ zcat data/uniprot_sprot.xml.gz | ./get_sp_fields.pl
    26745,17
    'Involve','d in the presentation of foreign antigens to the immune system'
    function
    
    And the rows from xml lines 26744-26746 are.
      <comment type="function">
        <text>Involved in the presentation of foreign antigens to the immune system</text>
      </comment>
    
    So is there a bug in XML::Parser? Since the text section is split into two calls of subfunction text. Or am I missing something here...

      Sorry to reply with a RTFM, but this is what the FM reads (emphasis added):

      Char (Expat, String)
      This event is generated when non-markup is recognized. The non-markup sequence of characters is in String. A single non-markup sequence of characters may generate multiple calls to this handler. Whatever the encoding of the string in the original document, this is given to the handler in UTF-8.

      Note that AFAIK all XML parsers behave like this, to allow you to parse documents even if they contain chunks of texts are bigger than the available memory.

      Also the XML::Parser review mentions this, and give you a way to get all the data.

      Update: the Perl XML FAQ also mentions this.

        No need to say sorry. If it is RTFM then it is RTFM. As I do seem to be missing something :). I infact was doing similar combining of string here myself by now (as a way to get around problem), which was mentioned in that review link.

        I quess this gets marked to things, we live and learn.