in reply to Re^3: Search and replace again
in thread Search and replace again

For starters, XML is a binary format. binmode definitely won't hurt anything.

binmode doesn't just disable :crlf; it disables any :encoding too.

You probably should always use binmode or equivalent (e.g. use open), either to remove layers* when you want to ensure the bytes are unmolested*, or to add some when you want to output text.

* — These may be added via $ENV{PERLIO}, via -C, or by Perl itself as the case is for :crlf on Windows.

Replies are listed 'Best First'.
Re^5: Search and replace again
by crashtest (Curate) on Apr 20, 2010 at 18:17 UTC

    "XML is a binary format" - not to nitpick (except I will), but that's just not right. From http://www.w3.org/XML/:

    Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).
    (emphasis mine) Obviously at some point the XML text has to be encoded to a binary format. XML::LibXML's toString method (when called on a document) does do that, so at that point, you are indeed dealing with binary data and should turn off any PerlIO layers on your output handle, as you did in your example. I didn't realize that $doc->toString returned binary data.

      I'm not going to argue that XML isn't text. At some levels, it definitely a valid position to think of XML as a text format. (It's human readable and human editable, after all.)

      But topic at hand is far lower level, and such details does matter. Let's compare HTML (a text format) and XML (a binary format).

      HTMLXML
      MIME typetextapplication (binary)
      Character EncodingExternal to documentEmbedded in document
      ParserThe document must be decoded prior to being given to the parser or information allowing the parser to do so must be provided to the parser.The document cannot be decoded prior to being given to the parser because the document must be parsed to determine its encoding.
      GeneratorThe document must be returned unencoded or the generator must indicate which encoding was used to encode it.The encoding must be chosen before the document is generated, so the text in the document is already encoded.

      Your definition may differ. This is the one I was using.