I'm getting on in years and to be honest I'm much more familiar with BASIC than Perl (cut my teeth on a TRS-80 Model I)! But I have come up against a problem and am wondering if Perl might provide a simple solution.

To simplify the issue as much as possible, I have a XML file that I want to read line-by-line and selectively copy to another target XML file. The file will have a format something like this (pay attention to the <label_a ...> ... </label> blocks):

<?xml version="1.0" encoding="ISO-8859-1"?> <...other lines that must be copied...> <label_x data1="somevalue" data2="someothervalue" data3="anothervalue" +> <label_y="somevalue"> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> </label_y> <label_y="somevalue"> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> </label_y> <label_a timea="20140623203000 -0400" timeb="20140623210000 -0400" + id="must_match_this"> <label_b="data">data_of_variable_number_of_lines_and_indentati +ons</label_b> <label_b="more_data">data_of_variable_number_of_lines_and_inde +ntations</label_b> <label_c> <label_d>Some_data_may_be_indented_further</label_d> </label_c> <label_b="still_more_data">data_of_variable_number_of_lines_an +d_indentations</label_b> </label_a> <label_a timea="20140623210000 -0400" timeb="20140623220000 -0400" + id="must_match_this"> <label_b="data">data_of_variable_number_of_lines_and_indentati +ons</label_b> <label_b="more_data">data_of_variable_number_of_lines_and_inde +ntations</label_b> <label_c> <label_d>Some_data_may_be_indented_further</label_d> </label_c> <label_b="still_more_data">data_of_variable_number_of_lines_an +d_indentations</label_b> </label_a> </labelx>

So with that in mind, here's what I need to have happen:

Everything BEFORE the first label_a block must be copied to the new file.

If a label_a block is to be copied, then everything between the <label_a ...> and </label> (including the lines with those tags) must be copied. If the block is not to be copied, then nothing from that block should be copied. The data between those tags may vary and have different indentation levels.

The decision to copy the block should be based on two things. The first is the "timea" value in each <label_a ...> line, which looks something like this:

20140623203000 -0400

This breaks down to 2014-06-23 20:30:00 with a GMT offset of -4:00

The second is the "id" value which is on the same line and will contain one of a few specific strings. What I want to do is this:

IF the ID string matches a particular value then IF the hour is within a certain range (say 18 to 20) then IF it is one of certain specific days of the week (say Monday, Wednesday, or Friday) then I want to copy the entire label_a block to the other file. If ALL of those conditions are not met, then I want to try another similar test using different ID, hour, and day of week values. If NONE of the tests result in the code block being copied, then I want the entire label_a block to be skipped (not copied) and the same series of test run on the next label_a block, looping until the end of the file.

There will not be any stray lines between label_a blocks until the </labelx> block is reached at the end of the file.

The only thing about this that would be difficult in BASIC would be calculating the day of week from the "datea" string. Other than that, I could have this coded in BASIC in about an hour. But I am not that familiar with Perl and don't have any idea how to do the XML block selection and selective writing to the new file, and I definitely don't know how to extract the day of week from that date string. Could one of you kind monks please help get me started in the right direction?


In reply to How can I keep or discard certain blocks of an XML file based on first line of block? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.