file split

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl
use strict;
use warnings;

my $tag;
my $output;
my $fh;
my $flag ='';
my $output_text;

while (<DATA>) {
    chomp;
    s/[\cA-\cZ]//g;
    s/\^[A-Z]//g;

        if(/^{(.*)}$/)     # match {METATAG} line
                {
                $fh = xml_output($output, $tag, $fh);
                $output = "";
                $tag = $1;
                }
                 else
                {            # not a {TAG} line
                next unless($tag);
                next if(/^\s*$/);
                $output .= ($output) ? " $_" : "<$tag>$_";
                }
        } # End of While Loop


$fh = xml_output($output, $tag, $fh);


if($fh) {
    print $fh "</ROOT>\n";
    close($fh);
}
exit(0);
# Subroutine to open the file with the filename as {TAG}


sub xml_output {
    my ($output, $tag, $fh) = @_;
    if($output) {
        if($output =~ m/<TAG>(.*)/) {
            if($fh) {
                print $fh "</ROOT>\n";
                close($fh);
            }
            open($fh, '>', "$1.xml") or die "$1.xml: $!";
            print $fh "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<RO
+OT>\n";
        }


               $output =~ s/\s*(?=(<.+>|<.+\/>|<\/.+>|<\/.+><.+>))//g;
               $output =~ s/\s*(?=<)//g;
               $output =~ s/(?<=>)\s*//g;
               $output =~ s/<br\/>/&lt;br\/&gt;/g;
               $output =~ s/\s*(?=(&lt;br\/&gt;)\s*)//g;
               $output =~ s/&lt;br\/&gt;\s*/&lt;br\/&gt;/g;
               $output =~ s/&lt;br\/&gt;$//g;
               print $fh "$output</$tag>\n";
             }
    return($fh);
  } # End of sub sroutine
__DATA__
020200001109VUTVS01 00004407B3^V^V^A2^C^D^V^V^A 0000 0001 050102 N S00
+00000000 00001444^B{IT}R
{DATE}
050102
{TDATE}
Sunday, January 02, 2005
{EDITION}
6
{TAG}
0412270403
{BODY}
Certified Financial Planner for DiStefano Finacial  Group in Westfield
+,
MA.
^C^D^V^V^A 0000 0002 050102 N S0000000000 00002158^B{IT}R
{DATE}
050102
{TDATE}
Sunday, January 02, 2005
{EDITION}
6
{PAGE}
H5
{TAG}
0412270405
{BODY}
Amdur - Rosenberg < Gabriela Rosenberg, the daughter of Anita and Samu
+el Rosenberg of Buenos
[download]

{IT} is the start of the file.
I have to split the files into many based on {IT} tag and filename with {TAG} value.
and all the contents are written in the file. file should be like for example: 0412270403.xml

<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<IT>R</IT>
<DATE>050102</DATE>
<TDATE>Sunday, January 02, 2005</TDATE>
<EDITION>6</EDITION>
<TAG>0412270403</TAG>
<BODY>Certified Financial Planner for DiStefano Finacial  Group in Wes
+tfield,MA.</BODY>
</ROOT>
[download]

Comment on file split Select or Download Code

Replies are listed 'Best First'.
Re: file split by BioLion (Curate) on Nov 26, 2009 at 09:49 UTC
How is this question unlike the others? Just a something something...	[reply]
Re^2: file split by Anonymous Monk on Nov 26, 2009 at 10:59 UTC
It is a large file, I have split the files into smaller modules. {IT} is the start of one file and filename should be the {TAG} value. I am not able to split the files. How can I do it Please help me.	[reply]
Re^3: file split by Anonymous Monk on Nov 26, 2009 at 11:11 UTC
Hire someone :/	[reply]
Re^2: file split by Anonymous Monk on Nov 26, 2009 at 12:34 UTC
please tell me how to split the file into smaller pieces and file name should be {TAG}. All the contents should be in a file until it finds another {IT} tag. . Right now the script works only if {TAG} is at the firstline as the start of each file. If replace <IT> instead of <TAG>(.*), if the value is same file will be overwritten. . Please tell me how can I split the large file into smaller bits.	[reply]
Re^3: file split by Anonymous Monk on Nov 26, 2009 at 13:22 UTC
What is the name of this format?	[reply]
Re^4: file split by Anonymous Monk on Nov 26, 2009 at 16:53 UTC
Re^5: file split by Anonymous Monk on Nov 28, 2009 at 15:03 UTC