#!/usr/bin/perl
use strict;
use warnings;
my $tag;
my $output;
my $fh;
while (<DATA>) {
chomp;
if(/^\.\.(.*):$/) # match line
{
$fh = sub_output($output, $tag, $fh);
$output = "";
$tag = $1;
print $tag;
}
else
{ # not a {TAG} line
next unless($tag);
next if(/^\s*$/);
$output .= ($output) ? " $_" : "<$tag>$_";
}
} # End of While Loop
$fh = sub_output($output, $tag, $fh);
if($fh) {
print $fh "</root>\n";
close($fh);
}
exit(0);
# Subroutine to open the file with the filename as DN
sub sub_output {
my ($output, $tag, $fh) = @_;
if($output) {
if($output =~ m/<DN>(.*)/) {
if($fh) {
print $fh "</root>\n";
close($fh);
}
open($fh, '>', "$1.xml") or die "$1.xml: $!";
print $fh "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n";
}
print $fh "$output</$tag>\n";
}
return($fh);
} # End of sub sroutine
__DATA__
..DN:
000044119
..CB:
..SN:
8046ETK6
..PD:
20091030
..DD:
Friday, October 30, 2009
..IS:
N
..IS:
R
..ID:
..DN:
000044120
..CB:
..SN:
8046ETK6
..PD:
20091030
..DD:
Friday, October 30, 2009
..DD:
Friday, October 31, 2009
..PT:
NE
..PT:
sect
The above code converts the input as XML files.
How to rename the tags as
DN as "doc num".
SN as "code"
PD as "date".
and so on.
If there are two similar tags one below the other
for example:
PT and if it's value is 2 letters then I should name
as "categ" and if the value is more than 2 then name as "Head".
Please tell me how to do it
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.