cmilfo has asked for the wisdom of the Perl Monks concerning the following question:

I searched the back dated material but was unable to find this question.

I am processing an XML file that has binary material in it (the binary material is outside the global tags). Basically, I've made my input seperator the closing global XML tag </AuditData>. I then, strip everything up until the opening global XML tag off. This works really well (the problem isn't big enough to require a solution using any of the wonderful XML parsers Perl offers - no sarcasm, I love using XML::Parser and XML::Twig). Now for the problem.

One of the bits of the binary data is a ^Z (<CTRL> Z). The program breaks the while loop when it finds this. Under Linux and Tru64, the program runs great (I develop on Slackware 8.0). But, under NT -- which is where the program will reside -- it exits. Is there a way around this. My guess is that there exists a Special Charater that is the end of file marker for whatever OS you're on.

Thank you!
Casey

Replies are listed 'Best First'.
Re: End Of File Marker
by myocom (Deacon) on Aug 21, 2001 at 23:56 UTC

    You might try setting binmode on your filehandle before you read from it. It will certainly let you read past the EOF marker under NT, but it may have other side effects. Only tests on your specific systems/files will tell...

    Strictly speaking, though, it sounds like your "XML file that has binary material in it" isn't an XML file. It's a file that happens to contain an XML string in it. A minor-sounding (but very important) distinction.

    "One word of warning: if you meet a bunch of Perl programmers on the bus or something, don't look them in the eye. They've been known to try to convert the young into Perl monks." - Frank Willison
Re: End Of File Marker
by John M. Dlugosz (Monsignor) on Aug 22, 2001 at 00:36 UTC
    Using binmode will turn off all such special processing. So 0's won't be thrown away, ^Z won't mean EOF, etc. However, it won't normalize your EOLN characters, either.

    So, after cutting out the passage of interest, if the code that processes the XML is thrown off by the presence of \r characters (it shouldn't be since it's not line oriented and \r is lumped together as "whitespace"), use tr/\r//d to zap them.

    —John

Re: End Of File Marker
by Cine (Friar) on Aug 21, 2001 at 23:45 UTC
    CTRL-z is the eof marker in dos/win/nt, like CTRL-d is in *nix, but I didnt think it was being used anymore...
    Sorry I cant help you with your specifik problem...

    T I M T O W T D I
Re: End Of File Marker
by John M. Dlugosz (Monsignor) on Aug 22, 2001 at 00:33 UTC
    The ^Z character exists because in CP/M directory entries only held the number of sectors; nowhere did it store an exact length. So the only way to find the real end of a file was with a marker.

    The ^D used for a similar purpose in Unix is needed for things that are not files. How do you know you reached the end of a stream of text coming from a terminal?

    I suppose that's not needed anymore with pipes/sockets/etc using out-of-band signaling. But on a serial connection with only 2 pins hooked up, there is no way to indicate that the end-of-transmission other than in-band.

    I don't know if the EOF marker character is visible itself, or just sets the eof flag and gets eaten.

    Even today, type perl on a command line and start typing. How do you indicate you're done? You don't want to actually disconnect the transport layer of your console, since that will affect the underlying shell session as well. Typing the sentinal character is the way it works on many systems.

    —John

Re: End Of File Marker
by dailylemma (Scribe) on Aug 21, 2001 at 23:39 UTC
    Perhaps eof() would help. Also, it might help if the relevant code was available.