Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I'm getting on in years and to be honest I'm much more familiar with BASIC than Perl (cut my teeth on a TRS-80 Model I)! But I have come up against a problem and am wondering if Perl might provide a simple solution.
To simplify the issue as much as possible, I have a XML file that I want to read line-by-line and selectively copy to another target XML file. The file will have a format something like this (pay attention to the <label_a ...> ... </label> blocks):
<?xml version="1.0" encoding="ISO-8859-1"?> <...other lines that must be copied...> <label_x data1="somevalue" data2="someothervalue" data3="anothervalue" +> <label_y="somevalue"> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> </label_y> <label_y="somevalue"> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> <label_z>a value</label_z> </label_y> <label_a timea="20140623203000 -0400" timeb="20140623210000 -0400" + id="must_match_this"> <label_b="data">data_of_variable_number_of_lines_and_indentati +ons</label_b> <label_b="more_data">data_of_variable_number_of_lines_and_inde +ntations</label_b> <label_c> <label_d>Some_data_may_be_indented_further</label_d> </label_c> <label_b="still_more_data">data_of_variable_number_of_lines_an +d_indentations</label_b> </label_a> <label_a timea="20140623210000 -0400" timeb="20140623220000 -0400" + id="must_match_this"> <label_b="data">data_of_variable_number_of_lines_and_indentati +ons</label_b> <label_b="more_data">data_of_variable_number_of_lines_and_inde +ntations</label_b> <label_c> <label_d>Some_data_may_be_indented_further</label_d> </label_c> <label_b="still_more_data">data_of_variable_number_of_lines_an +d_indentations</label_b> </label_a> </labelx>
So with that in mind, here's what I need to have happen:
Everything BEFORE the first label_a block must be copied to the new file.
If a label_a block is to be copied, then everything between the <label_a ...> and </label> (including the lines with those tags) must be copied. If the block is not to be copied, then nothing from that block should be copied. The data between those tags may vary and have different indentation levels.
The decision to copy the block should be based on two things. The first is the "timea" value in each <label_a ...> line, which looks something like this:
20140623203000 -0400
This breaks down to 2014-06-23 20:30:00 with a GMT offset of -4:00
The second is the "id" value which is on the same line and will contain one of a few specific strings. What I want to do is this:
IF the ID string matches a particular value then IF the hour is within a certain range (say 18 to 20) then IF it is one of certain specific days of the week (say Monday, Wednesday, or Friday) then I want to copy the entire label_a block to the other file. If ALL of those conditions are not met, then I want to try another similar test using different ID, hour, and day of week values. If NONE of the tests result in the code block being copied, then I want the entire label_a block to be skipped (not copied) and the same series of test run on the next label_a block, looping until the end of the file.
There will not be any stray lines between label_a blocks until the </labelx> block is reached at the end of the file.
The only thing about this that would be difficult in BASIC would be calculating the day of week from the "datea" string. Other than that, I could have this coded in BASIC in about an hour. But I am not that familiar with Perl and don't have any idea how to do the XML block selection and selective writing to the new file, and I definitely don't know how to extract the day of week from that date string. Could one of you kind monks please help get me started in the right direction?
|
|---|