sannag has asked for the wisdom of the Perl Monks concerning the following question:
I am relative new to Perl. Please check to see what I am doing wrong.
I have following input tab delimited file, which need to be built to XML, 'pin'number is the unique identifier. If the 'Pin' matches then
populate show multiple 'CHARGE' under the report.
Reason1 Reason2 Reason3 Pin Name Zip Date Time
data1 data2 data3 Pin 1 data5 data6 data7 data8
data1 data2 data3 Pin 1 data5 data6 data9 data10
data1 data2 data3 Pin 1 data5 data6 data11 data12
I want it to build following XML
<XML_FILE>
<REPORT TYPE="AB">
<REASON1>data1</REASON>
<REASON2>data2</REASON2>
<REASON3>data3</REASON3>
<PERSON>
<PIN>Pin 1</PIN>
<NAME>data5</NAME>
<ZIP>data6<ZIP>
</PERSON>
<CHARGE>
<DATE>data7</DATE>
<TIME>data8</TIME>
</CHARGE>
<CHARGE>
<DATE>data9</DATE>
<TIME>data10<TIME>
</CHARGE>
<CHARGE>
<DATE>data11</DATE>
<TIME>data12</TIME>
</CHARGE>
</REPORT>
</XML_FILE>
Below is the code I have
use strict;
use XML::LibXML;
my $READFILENAME = "SomeDir\\data.txt";
my $WRITEFILENAME = "SomeDir\\test.xml";
my $doc = XML::LibXML::Document->new('1.0');
my $root = $doc->createElement("XML_FILE");
open (FILEWRITE, ">$WRITEFILENAME");
open (READFILE, $READFILENAME);
my $copy_person_pin = "XX";
foreach (<READFILE>) {
my $line = $_; chomp $line;
my @data = split(/\t/,$line);
my $reason1 = $data[0];
my $reason2 = $data[1];
my $reason3 = $data[2];
my $person_pin = $data[3];
my $name = $data[4];
my $zip = $data[5];
my $date = $data[6];
my $time = $data[7];
my $report = $doc->createElement("REPORT");
if ($person_pin ne $copy_person_pin)
{
# Build the Report tags
# I had to put $report out of if loop so that $report is avalibl
+e in else statement
# my $report = $doc->createElement("REPORT");
$report->setAttribute('TYPE'=>'AB');
my @sortedReportTag = qw ( REASON1
REASON2
REASON3
);
my %reportHashTags;
@reportHashTags { @sortedReportTag } = ($reason1,
$reason2,
$reason3
);
+
buildXMLElements(\@sortedReportTag, \%reportHashTags, $report);
$root-> appendChild($report);
# Build the element for Person Tag
my $person = $doc->createElement("PERSON");
my @sortedPersonTag = qw ( PIN
NAME
ZIP);
my %personHashTags;
@personHashTags { @sortedPersonTag } = ($person_pin,
$name,
$zip
);
# Build the elements for Person Tag
+
buildXMLElements(\@sortedPersonTag, \%personHashTags, $person);
$report-> appendChild($person);
# Build the elements for Charge Tag
my $charge = $doc->createElement("CHARGE");
my @sortedChargeTag = qw ( DATE
TIME
);
my %chargeHashTags;
@chargeHashTags { @sortedChargeTag } = ($date,
$time
);
# Build the elements for Charge Tag
+
buildXMLElements(\@sortedChargeTag, \%chargeHashTags, $charge);
$report-> appendChild($charge);
$copy_person_pin = $person_pin;
}
else {
my $charge = $doc->createElement("CHARGE");
my @sortedChargeTag = qw ( DATE
TIME
);
my %chargeHashTags;
@chargeHashTags { @sortedChargeTag } = ($date,
$time
);
# Build the elements for Charge Tag
+
buildXMLElements(\@sortedChargeTag, \%chargeHashTags, $charge);
$report-> appendChild($charge);
}
}
$doc->setDocumentElement($root);
# Write the XML to a file
print FILEWRITE ($doc->toString());
close FILEWRITE;
sub buildXMLElements() {
my($elementTags, $hashTags, $parentElement) = @_;
for my $name (@$elementTags) {
my $reportTag = $doc->createElement($name);
my $reportValue = $hashTags->{$name};
$reportTag->appendTextNode($reportValue);
$parentElement->appendChild($reportTag);
}
}
The output I get is, basically only the first 'CHARGE' is showing up not the other two.
<XML_FILE>
<REPORT TYPE="AB">
<REASON1>data1</REASON>
<REASON2>data2</REASON2>
<REASON3>data3</REASON3>
<PERSON>
<PIN>Pin 1</PIN>
<NAME>data5</NAME>
<ZIP>data6<ZIP>
</PERSON>
<CHARGE>
<DATE>data7</DATE>
<TIME>data8</TIME>
</CHARGE>
</REPORT>
</XML_FILE>
Re: Building XML with tab delimited
by kcott (Archbishop) on Aug 11, 2016 at 08:43 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
use autodie qw{:all};
use Text::CSV;
use XML::LibXML;
my $tsv_file_in = 'pm_1169537_input.tsv';
my $xml_file_out = 'pm_1169537_output.xml';
my $dom = generate_xml($tsv_file_in);
output_xml($dom, $xml_file_out);
sub generate_xml {
my ($tsv_file_in) = @_;
open my $in_fh, '<', $tsv_file_in;
my $csv = Text::CSV::->new({sep_char => "\t"})
or die 'Text::CSV::->new() FAILED: ', Text::CSV::->error_diag(
+);
my $dom = XML::LibXML::Document::->new(qw{1.0 UTF-8});
my $root = $dom->createElement('XML_FILE');
$dom->setDocumentElement($root);
my @headers = map { uc } @{$csv->getline($in_fh)};
my %header_index_for = map { $headers[$_] => $_ } 0 .. $#headers;
my ($last_pin, $report) = ('', undef);
while (my $row = $csv->getline($in_fh)) {
next if @$row == 1 && ! length $row->[0];
if ($row->[$header_index_for{PIN}] ne $last_pin) {
$last_pin = $row->[$header_index_for{PIN}];
$report = XML::LibXML::Element->new('REPORT');
$report->setAttribute(TYPE => 'AB');
$root->addChild($report);
for (qw{REASON1 REASON2 REASON3}) {
my $reason = XML::LibXML::Element->new($_);
$reason->appendText($row->[$header_index_for{$_}]);
$report->addChild($reason);
}
my $person = XML::LibXML::Element->new('PERSON');
$report->addChild($person);
for (qw{PIN NAME ZIP}) {
my $person_datum = XML::LibXML::Element->new($_);
$person_datum->appendText($row->[$header_index_for{$_}
+]);
$person->addChild($person_datum);
}
}
my $charge = XML::LibXML::Element->new('CHARGE');
$report->addChild($charge);
for (qw{DATE TIME}) {
my $charge_datum = XML::LibXML::Element->new($_);
$charge_datum->appendText($row->[$header_index_for{$_}]);
$charge->addChild($charge_datum);
}
}
return $dom;
}
sub output_xml {
my ($dom, $xml_file_out) = @_;
open my $out_fh, '>', $xml_file_out;
print $out_fh $dom->toString(1);
return;
}
The input, pm_1169537_input.tsv, is exactly what you posted.
The output, pm_1169537_output.xml, looks exactly as you wanted (except for the erroneous end tag, </REASON>, which now appears correctly as </REASON1>):
<?xml version="1.0" encoding="UTF-8"?>
<XML_FILE>
<REPORT TYPE="AB">
<REASON1>data1</REASON1>
<REASON2>data2</REASON2>
<REASON3>data3</REASON3>
<PERSON>
<PIN>Pin 1</PIN>
<NAME>data5</NAME>
<ZIP>data6</ZIP>
</PERSON>
<CHARGE>
<DATE>data7</DATE>
<TIME>data8</TIME>
</CHARGE>
<CHARGE>
<DATE>data9</DATE>
<TIME>data10</TIME>
</CHARGE>
<CHARGE>
<DATE>data11</DATE>
<TIME>data12</TIME>
</CHARGE>
</REPORT>
</XML_FILE>
Actually, you didn't show the XML declaration:
modify to suit if want something different.
Update:
Oops! Spotted a bug in my code. Two instances of
$headers[$header_index_for{PIN}]
now corrected to
$row->[$header_index_for{PIN}].
| [reply] [d/l] [select] |
|
Wow Ken! Thank you so much. This solution works. I have been programming in Perl in no more than 10 days....so there are lot of unknowns for me. This has been a great resource.
Thanks again!
| [reply] |
|
"Wow Ken! Thank you so much. This solution works."
You're welcome.
"I have been programming in Perl in no more than 10 days....so there are lot of unknowns for me. This has been a great resource."
While I did see your "I am relative new to Perl.", I wasn't aware just how new.
Not a bad effort at all for a total beginner.
Here's some additional resources (in no particular order).
-
If you haven't already done so, I recommend you bookmark "http://perldoc.perl.org/perl.html".
This has links to tutorials, FAQs and documentation.
[Hint: If you just want to look up a function,
the "Perl functions A-Z" index
(that's the "Functions" link in the sidebar) is much faster, and easier to navigate, than the
perlfunc page,
which is very large and can take substantially longer to load.]
-
Prefer lexical filehandles and the 3-argument form of open.
The filehandles you've used are package variables, with global scope:
in larger programs, where you've perhaps unwittingly used something with a non-specific name
(like FILEWRITE) in more than one place,
you can easily introduce hard-to-track-down bugs.
Furthermore, when a lexical filehandle goes out of scope, Perl will automatically
close it for you:
one more thing you don't have to worry about and a potential point of error removed.
-
I/O, and other operations, should be checked for failure.
This can be tedious and error-prone:
let Perl handle this for you with the autodie pragma.
-
Use lexical variables in the smallest scope possible.
See "perlsub: Private Variables via my()"
for more on this.
You can find examples (of all of the code-related points above) in my posted solution.
| [reply] [d/l] |
|
|
| [reply] |
|
Re: Building XML with tab delimited
by choroba (Cardinal) on Aug 11, 2016 at 00:21 UTC
|
XML::LibXML is great for XML processing, but not so great for producing it. Some people prefer XML::Writer for that, but for a fixed structure like here, you can also use Template:
#!/usr/bin/perl
use warnings;
use strict;
use Template;
my %data;
<> for 1, 2; # Skip header.
while (<>) {
chomp;
my ($r1, $r2, $r3, $pin, $name, $zip, $date, $time) = split /\t/;
$data{$pin}{reasons} = [ $r1, $r2, $r3 ];
@{ $data{$pin} }{qw{ name zip }} = ($name, $zip);
push @{ $data{$pin}{charge} }, [ $date, $time ];
}
my $template = 'Template'->new;
$template->process(*DATA{IO}, { data => \%data });
__DATA__
<XML_FILE>
[% FOR pin IN data.keys %] <REPORT TYPE="AB">
[%- i = 1 ; FOR r IN data.$pin.reasons %]
<REASON[% i %]>[% r %]</REASON[% i %]>[% i = i + 1; END %]
<PERSON>
<PIN>[% pin %]</PIN>
<NAME>[% data.$pin.name %]</NAME>
<ZIP>[% data.$pin.zip %]</ZIP>
</PERSON>
[% FOR charge IN data.$pin.charge %] <CHARGE>
<DATE>[% charge.0 %]</DATE>
<TIME>[% charge.1 %]</TIME>
</CHARGE>
[% END %] </REPORT>
[% END %]</XML_FILE>
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
| [reply] [d/l] [select] |
|
Thank you! to be honest I hit a mental block using XML:LibXML. I already invested quit a bit of time building the program in XML::LibXML if I fail to find a solution with in LibXML I will switch to XML:Writer. :(
| [reply] |
|
Thank you, choroba, for your solution.
(I was able to run it on my Strawberry Perl installation.)
I learned about the Template module, the front-end module to the Template Toolkit.
I also learned about *DATA{IO}, which apparently points to the data below the special literal __DATA__.
I used Super Search and found the Using the DATA file handle for ARGV Perl Monk Meditation. This cleared up my questions about the special literal __DATA__ values IO slot, *DATA{IO}, of the *DATA typeglob.
| [reply] |
Re: Building XML with tab delimited
by Marshall (Canon) on Aug 11, 2016 at 04:18 UTC
|
I do not have a solution, but I have some suggestions that may help.
- I reformatted the code to a more consistent indentation style. You can use a different one, but consistency helps me in the understanding.
- I converted the code to use the DATA segment for reading instead of an input file.
- I added "use warnings;" this showed a number of issues. You need to skip these first 2 lines of the input file. Use of prototypes, e.g.sub buildXMLElements() {} should just be sub buildXMLElements {}. However this is not the main problem.
- I added in some print statements that may help somebody else figure out what the problem is. I don't see it at the moment.
My code produces the output to STDERR:
And output file:
reformatted code:
| [reply] [d/l] [select] |
|
Thank you very much for the effort. Once thing I noticed is it matters where I place the line "my $report = $doc->createElement("REPORT");" if I placed it outside the while loop and then remove the "else" to replace it with the statement "if ($person_pin eq $copy_person_pin)"
then all my 'CHARGE' tag start appearing in the XML but then the XML struct is altered.
I am beginning to believe XML:LibXML is very funky.
| [reply] |
|
This is actually a very normal thing. I saw the comment in the code. You cannot "sometimes" and "sometimes not" create a "my" variable in some conditional statement and expect that variable will around for use elsewhere. If you do that, you will only have use of it during that "if" clause only. If you violate that, then I think the results are undefined and unpredictable. Maybe I didn't understand what you meant here. This is a very complex module and I don't see the problem and admit such.
| [reply] |
|
Hello sannag,
Marshall's helpful debugging should have gotten you closer, at least. :-) That said, do note that XML::LibXML has lots of backward compatibility interfaces and it's fairly complex to work with. There's one small change that can help your case:
In your buildXMLElements sub call, you're passing in the parent element directly as it is and appending child elements into it. If you pass in a ref to it instead, you can see your <CHARGE> elements appear.
HTH. Good luck and have fun!
| [reply] [d/l] [select] |
Re: Building XML with tab delimited
by Anonymous Monk on Aug 11, 2016 at 03:34 UTC
|
Get rid of prototypes, instead of sub name() { ... write sub name { ... and report what happens | [reply] [d/l] [select] |
|
Well, tried it but made no difference :(
| [reply] |
|
What AnonyMonk is getting at is that you define sub buildXMLElements() { ... } to take no arguments and then pass it a bunch of arguments. This only works because you call the function in a way that defeats prototype checking. If you had enabled warnings, Perl would have chided you about this situation:
c:\@Work\Perl\monks>perl -le
"use warnings;
use strict;
;;
func('hi there');
;;
sub func () { print @_; }
"
main::func() called too early to check prototype at -e line 1.
hi there
Prototypes can be very useful for what they're useful for. But why use them in a way that defeats their purpose and also means you have to disable warnings so it won't nag you endlessly? Please see Prototypes in perlsub, and also the very enlightening Far More than Everything You've Ever Wanted to Know about Prototypes in Perl -- by Tom Christiansen.
Give a man a fish: <%-{-{-{-<
| [reply] [d/l] [select] |
|
|
|