Well, there already a number of XML parsers available for perl.

XML::Parser XML::TreeBuilder XML::Simple, in fact when I do a search of modules on CPAN that match /^XML/ I get no less than 341 matches!

And given the nature of your question im suspecting that probably (and no offense intended) you arent going to come up with somthing superior.

To answer you question however, XML files _are_ text file. Thats part of their charm. When they are displayed in IE it renders them in a relatively intuitive and simple format, but they way it renders it may even be subtly different from the way it is actually contained in the file. This includes showing the attributes of a tag on one line. This has nothing to do with how they are stored in a file, nor does it have anything to do with how you open it. An example:

<?xml version="1.0" encoding="ISO-8859-1"?> <foo bar="baz"> <weird a='1' b='2' c='3' /> <nested > text </nested> <empty_nest> </empty_nest> </foo>
Renders like this in IE
<?xml version="1.0" encoding="ISO-8859-1" ?> - <foo bar="baz"> <weird a="1" b="2" c="3" /> <nested>text</nested> <empty_nest /> </foo>
Note that the <empty_nest></empty_nest> tag has been converted to a "endless tag" <empty_nest />, so what you see in IE is only an abstract representation of what is in the file.

Most of this follows from the very nature of XML and markup languages in general. Normally they arent line oriented but rather stream oriented, where the stream is composed of tags and data. And to be honest because of this flexibility writing correct parsers for them is non trivial.

If this is a project for fun or learning, then you have much research to do. If you are doing this 'cause you didnt know there were already excellent XML parsers then I would say have a trawl through CPAN and dont waste your time.

HTH

Yves / DeMerphq
---
Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)


In reply to Re: Reading a .XML file by demerphq
in thread Reading a .XML file by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.