in reply to Splitting XML file on Processing Instructions

You would need to add file I/O error handling, and perhaps handle cases of missing tags. I assumed an input file named data. I assumed that the value inside <h1>...</h1> was used in the file name, because there is no reference to 'test' in the file, and perhaps you meant 'text' instead. But despite the caveats, this does something like you wanted:

#! /usr/bin/perl -w use strict; my $text; if (open INPUT, '<data') { local $/; $text = <INPUT>; close INPUT; } while ($text =~ /<\?split \?>(.*?)(?=<\?split \?>)/sg) { my $fragment = $1; my ($h1) = $fragment =~ /<h1>(.*?)<\/h1>/is; my ($from, $to) = $fragment =~ /<no>(.*?)<\/no>/isg; if (open OUTPUT, ">${h1}-nr${from}to${to}.xml") { print OUTPUT $fragment; close OUTPUT; } } exit 0;

Replies are listed 'Best First'.
Re^2: Splitting XML file on Processing Instructions
by Anonymous Monk on Jun 24, 2004 at 08:19 UTC
    Thanks for your advice! Your code works fine with respect to the splitting, but the problem is that it gets only the first two <no> elements for the file name. If I add for example a third (or more) <no>-element then $from contains the contents of the first <no> element and $to contains the contents of the second <no> element but I need to get the contents of the first and last <no> element for the file name in the following example that would be text-nr4to20.xml
    <?split ?> <h1>text</h1> <text> Textinhalt <no>4</no> </text> text ... text <no>18</no> <no>19</no> <no>20</no> <h6>text</h6> <?split ?>
    Maybe it's possible to write the contents of the <no> elements to an array and get the first with $a[0] and the last with $a[$#] but I didn't manage to change the code with respect to that. Maybe you can help me once more?

      No problem. This just requires that the <no>...</no> tags are put in an array. But we use $numbers[-1] instead of $numbers[$#numbers].

      #! /usr/bin/perl -w use strict; my $text; if (open INPUT, '<data') { local $/; $text = <INPUT>; close INPUT; } while ($text =~ /<\?split \?>(.*?)(?=<\?split \?>)/sg) { my $fragment = $1; my ($h1) = $fragment =~ /<h1>(.*?)<\/h1>/is; my @numbers = $fragment =~ /<no>(.*?)<\/no>/isg; if (open OUTPUT, ">${h1}-nr${numbers[0]}to${numbers[0-1}.xml") { print OUTPUT $fragment; close OUTPUT; } } exit 0;