comment on

I could simply tell you how to quickfix your script. Instead, I will show you how to separate your problem into smaller reusable chunks.

First we are going to define a class which holds the data of a single record in the input file:

package MyDataRecord;

# Represents a record of MyData

use Moose;
use Moose::Util::TypeConstraints;

# -DateParse enables additional Date formats
use Class::Date qw(-DateParse);

# load carp for better warnings
use Carp        qw(cluck confess);

# create a subtype for Dates
subtype 'MyDate'  => as class_type('Class::Date');

# coerce string input, depending on the format
coerce  'MyDate'  => from 'Str' => via {
    /^\d{6}$/ ?
    Class::Date->new([unpack "A4A2A2","20$_"]) :
    Class::Date->new($_)
};

# data-members
has it      => ( isa => 'Str',    is => 'rw', required => 1 );
has edition => ( isa => 'Str',    is => 'rw', required => 1 );
has tag     => ( isa => 'Str',    is => 'rw', required => 1 );
has body    => ( isa => 'Str',    is => 'rw', required => 1 );
has date    => ( isa => 'MyDate', is => 'rw', required => 1, coerce =>
+ 1 );
has tdate   => ( isa => 'MyDate', is => 'rw', required => 1, coerce =>
+ 1 );

# check date integrity on construction
sub BUILD {
        my ( $self, $params ) = @_;
        cluck  "date and tdate should be identical\n"
            unless $self->tdate->ymd eq $self->date->ymd;
}

1;
[download]

Next we write a Parser class which can read our file and return an array of hashes.

package MyDataParser;

# parses a file or handle of MyData

use Moose;
use Moose::Util::TypeConstraints;

use IO::File;

subtype 'MyFile' => as class_type('IO::File');
coerce  'MyFile' => from 'Str' => via { IO::File->new($_) };

has file => ( isa => 'MyFile', is => 'rw', coerce => 1 );

sub parse
{
    my $self    = shift;
    my $file    = $self->file;
    my $re_tag  = qr/ ^ \{ ([A-Z]+) \} $ /x;
    my @records = ();
    my %record;
    my $tag;

    # iterate the file
    while ( <$file> )
    {
        # We found a tag
        if ( /$re_tag/ )
        {
            # save the new tag name because we use the value often
            my $ntag = lc $1;

            unless ( keys %record ) {
                # create the first record (only happens on the first l
+ine)
                %record = ( $ntag => '' );
            } else {
                # otherwise cut off the last newline of input
                chomp $record{$tag} if $tag;
                # initialize an empty value for the tag
                $record{$ntag} = '';
                # we found a new record
                if ( $ntag eq 'it' )
                {
                    # save the data as a record object
                    push @records, MyDataRecord->new(%record);
                }
            }
            # remember current tag for the next iterations
            $tag = $ntag;
            next;
        }
        # not a tag, so add the data to the current tags value
        $record{$tag} .= $_;
    }
    # save the last record, since there is no final {IT}
    push @records, MyDataRecord->new(%record);

    return \@records;
}
[download]

Now we created a piece of code that whe can use in every skript we want to, not having to think about the internals of the input file anymore. Regarding the original problem such a script would look like.

package main;

use warnings;
use strict;

use Template;
use MyDataParser;

# get the filename from the console
my $file = shift @ARGV || die "Error. You must specify a filename.\n";

# get the data from our file parser
print "parsing $file\n";
my $records = MyDataParser->new( file => $file )->parse;

# template for XML output
# use Template::Plugin::XML::Escape to ensure valid XML
my $ixemel  = q[
    <?xml version="1.0" encoding="UTF-8"?>
    [% USE XML.Escape %]
    <record>
        <it>[% record.it %]</it>
        <date>[% record.date.strftime('%y%m%d') %]</date>
        <tdate>[% record.tdate.strftime('%A, %B %d, %Y') %]</tdate>
        <edition>[% record.edition %]</edition>
        <tag>[% record.tag %]</tag>
        <body>[% record.body | xml_escape %]</body>
    <record>
];

# iterate the records
for my $record ( @$records )
{
    # create a template object
    my $tt = Template->new();

    # and process it( process->(template, data, filename) );
    print "writing ", $record->tag, "\n";

    die $tt->error(), "\n"
        unless $tt->process( \$ixemel, {record=>$record}, $record->tag
+ );
}

print "done\n";
[download]

Note the use of Templating for the XML output. This does us 2 favours here. First it makes the xml easier to maintain and the Template module does the work of writing our output files for us.

holli

You can lead your users to water, but alas, you cannot drown them.

In reply to Re: Split file based on tag by holli
in thread Split file based on tag by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.