Thanks, a very good point. A SAX parser is the robust way to implement "creative file scanning", and I just didn't think of it. But the point regarding this alternative remains true: namely, that as you start doing additional tasks beyond reading and appending, it'll get progressively harder to get it done.

Regardless, since the goal of the project is to learn new technologies, maybe the best approach would be this: do a little reading, and a lot of thinking, about how XML document types can be used to represent various structures, and then consider what kinds of structural relationships will exist within the journal data. Then choose a data representation, and write a schema or DTD (even if no validation is needed, it's good practice). Finally, play with the various tools including both kinds of parsers. I'd even suggest poking around with a q&d "parser" that builds on something like

my ($tag, $attrs, $body) = /<(\w+)\s+(.*?)>(.*?)</\1>/s;
to see why it is disrecommended by so many people.

--roundboy


In reply to Re: Re: Re: XML and file size by roundboy
in thread XML and file size by silent11

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.