in reply to Processing Two XML Files in Parallel

Does each element have a line to itself or is the data multiline? As in, if we read line say 123 from file A, will line 123 in file B be the correct line to do the processing with?

If that is the case, then you could just read both files a line at a time, and use a simple regex to get the value out of the <elem> wrapper;

my ($a,$b,$value_a,$value_b); while (1) { $a = <A>; $b = <B>; if ($a =~ m/<elem>(.*?)</elem>/) { $value_a = $1; } if ($b =~ m/<elem>(.*?)</elem>/) { $value_b = $1; } last if !defined $value_a || !defined $value_b; print data_transform($value_a, $value_b); }

I'm sure better perl adepts than me could write it better/faster, but I think that would work if the files have a line for line concurrency.

Replies are listed 'Best First'.
Re^2: Processing Two XML Files in Parallel
by tinita (Parson) on Jul 23, 2011 at 12:15 UTC

    So you like catch phrases, uh?
    Let me tell you something:
    In about 97% of the time, parsing XML with regexes is the root of all evil. The remaining 3% are left for one-time, quick & dirty scripts and maybe some special cases (where you can assure the XML will stay exactly like that).
    Let me tell you why:
    The creator of the XML to parse might change it. All elements might be on one line. Maybe there will be some empty lines between the tags. Maybe the elem tags will get attributes in the future. In all cases your script will suddenly stop to work, although the actual content you want didn't change. And somebody has to fix it quickly. In the end it's more work then just doing it right from the beginning, and potentially you annoyed a customer and your boss.

    That's how experienced programmers think. Because they know that things like that happen.
    You not only posted a quick & dirty solution, you even bashed someone for posting a clean and correct solution. A quick & dirty solution is ok (although it would be nice to comment that it depends on the exact XML format), and you actually got some ++ for it, but then bashing someone elses correct solution is just infantile.

    A reply falls below the community's threshold of quality. You may see it by logging in.