in reply to Split file based on tag

I could simply tell you how to quickfix your script. Instead, I will show you how to separate your problem into smaller reusable chunks.

First we are going to define a class which holds the data of a single record in the input file:
package MyDataRecord; # Represents a record of MyData use Moose; use Moose::Util::TypeConstraints; # -DateParse enables additional Date formats use Class::Date qw(-DateParse); # load carp for better warnings use Carp qw(cluck confess); # create a subtype for Dates subtype 'MyDate' => as class_type('Class::Date'); # coerce string input, depending on the format coerce 'MyDate' => from 'Str' => via { /^\d{6}$/ ? Class::Date->new([unpack "A4A2A2","20$_"]) : Class::Date->new($_) }; # data-members has it => ( isa => 'Str', is => 'rw', required => 1 ); has edition => ( isa => 'Str', is => 'rw', required => 1 ); has tag => ( isa => 'Str', is => 'rw', required => 1 ); has body => ( isa => 'Str', is => 'rw', required => 1 ); has date => ( isa => 'MyDate', is => 'rw', required => 1, coerce => + 1 ); has tdate => ( isa => 'MyDate', is => 'rw', required => 1, coerce => + 1 ); # check date integrity on construction sub BUILD { my ( $self, $params ) = @_; cluck "date and tdate should be identical\n" unless $self->tdate->ymd eq $self->date->ymd; } 1;
Next we write a Parser class which can read our file and return an array of hashes.
package MyDataParser; # parses a file or handle of MyData use Moose; use Moose::Util::TypeConstraints; use IO::File; subtype 'MyFile' => as class_type('IO::File'); coerce 'MyFile' => from 'Str' => via { IO::File->new($_) }; has file => ( isa => 'MyFile', is => 'rw', coerce => 1 ); sub parse { my $self = shift; my $file = $self->file; my $re_tag = qr/ ^ \{ ([A-Z]+) \} $ /x; my @records = (); my %record; my $tag; # iterate the file while ( <$file> ) { # We found a tag if ( /$re_tag/ ) { # save the new tag name because we use the value often my $ntag = lc $1; unless ( keys %record ) { # create the first record (only happens on the first l +ine) %record = ( $ntag => '' ); } else { # otherwise cut off the last newline of input chomp $record{$tag} if $tag; # initialize an empty value for the tag $record{$ntag} = ''; # we found a new record if ( $ntag eq 'it' ) { # save the data as a record object push @records, MyDataRecord->new(%record); } } # remember current tag for the next iterations $tag = $ntag; next; } # not a tag, so add the data to the current tags value $record{$tag} .= $_; } # save the last record, since there is no final {IT} push @records, MyDataRecord->new(%record); return \@records; }
Now we created a piece of code that whe can use in every skript we want to, not having to think about the internals of the input file anymore. Regarding the original problem such a script would look like.
package main; use warnings; use strict; use Template; use MyDataParser; # get the filename from the console my $file = shift @ARGV || die "Error. You must specify a filename.\n"; # get the data from our file parser print "parsing $file\n"; my $records = MyDataParser->new( file => $file )->parse; # template for XML output # use Template::Plugin::XML::Escape to ensure valid XML my $ixemel = q[ <?xml version="1.0" encoding="UTF-8"?> [% USE XML.Escape %] <record> <it>[% record.it %]</it> <date>[% record.date.strftime('%y%m%d') %]</date> <tdate>[% record.tdate.strftime('%A, %B %d, %Y') %]</tdate> <edition>[% record.edition %]</edition> <tag>[% record.tag %]</tag> <body>[% record.body | xml_escape %]</body> <record> ]; # iterate the records for my $record ( @$records ) { # create a template object my $tt = Template->new(); # and process it( process->(template, data, filename) ); print "writing ", $record->tag, "\n"; die $tt->error(), "\n" unless $tt->process( \$ixemel, {record=>$record}, $record->tag + ); } print "done\n";
Note the use of Templating for the XML output. This does us 2 favours here. First it makes the xml easier to maintain and the Template module does the work of writing our output files for us.


holli

You can lead your users to water, but alas, you cannot drown them.

Replies are listed 'Best First'.
Re^2: Split file based on tag
by Anonymous Monk on Nov 28, 2009 at 03:29 UTC
    sub xml_output { my ($output, $tag, $fh) = @_; if($output) { if($output =~ m/<IT>(.*)/) { if($fh) { print $fh "</ROOT>\n"; close($fh); } open($fh, '>', "$1.xml") or die "$1.xml: $!"; print $fh "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<RO +OT>\n"; } $output =~ s/\s*(?=(<.+>|<.+\/>|<\/.+>|<\/.+><.+>))//g; print $fh "$output</$tag>\n"; } return($fh); } # End of sub sroutine
    Please tell me how can i change in the this subroutine to split the file until it finds {IT} tag and filename as the {TAG} value. Please help me with this
      Go and buy yourself a working brain.


      holli

      You can lead your users to water, but alas, you cannot drown them.