Anonymous Monk:

No-one else seems to have mentioned the perils of parsing XML with regular expressions, so I guess I'll do so. It's all fine so long as the XML continues to come in to you formatted as your example, or if you control both ends of the data feed.

However, when dealing with third-party data feeds, at some point, something will eventually happen and they'll change the formatting to give you a headache. For example, suppose the data comes in like this:

<breakfast_menu> <food><name>Belgian Waffles</name><price>$5.95</price> <description>Two of our famous Belgian Waffles with plenty of real + maple syrup</description> <calories>650</calories> </food> <food><name>Strawberry Belgian Waffles</name><price>$7.95 </price><description>Light Belgian waffles covered with strawb +erries and whipped cream </description><calories>900</calories> </food> <food><name>Berry-Berry Belgian Waffles </name> <price>$8.95</price> <description>Light Belgian waffles covered with an assortment o +f fresh berries and whipped cream</description><calories>900</calories> </food> <food> <name>French Toast</name> <price>$4.50</price> <description>Thick slices made from our homemade sourdough brea +d</description> <calories>600</calories> </food> <food> <name> Homestyle Breakfast</name> <price>$6</price> <description>Two eggs, bacon or sausage, toast, and our ever-po +pular hash browns</description> <calories>950</calories> </food> <food><name>Robot Cogs</name><price>$123.456</price></food> <food><name>Berries &amp; More Berries Waffles</name><price>11.5</pric +e></food> </breakfast_menu>

Here, you'll find several things that can cause you some trouble:

So you'll find that you'll get awful results with your code:

$ perl pm1208325_proc_xml.pl ugly.xml Homestyle Breakfast 4.50 Berries &amp; More Berries Waffles 123.456 French Toast 8.95 Strawberry Belgian Waffles 5.95

Notice that due to the ugliness I added to the XML file, the output is not only ugly, but wrong!

Not only are some items missing from the output, but since you're using separate arrays to keep your values, any parsing error one one of the values makes your arrays get out of synchronization, so the wrong prices appear on some items.

There are other headaches you can get into when dealing with XML files, too. So you may want to learn one of the XML handling libraries. It's a little bit of a pain at first, but once you're used to it, these sorts of issues just magically go away. Then you can use the time you're not wrestling XML data to handle the other issues, like formatting values!

I used XML::Twig and whipped something up and it displays:

$ perl ex_Xml_Twig_pm1208325.pl ugly.xml Belgian Waffles $5.95 Berries & More Berries Waffles $11.50 Berry-Berry Belgian Waffles $8.95 French Toast $4.50 Homestyle Breakfast $6.00 Robot Cogs $123.46 Strawberry Belgian Waffles $7.95

...roboticus

When your only tool is a regular expression, all XML problems look insurmountable.


In reply to Re: aligning text by roboticus
in thread aligning text by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.