Hey all, hoping to tap the wisdom of the great perl monks. Sadly, I've found the Perl community very sparse and inaccessible for new users but hopefully this place will prove me wrong. My issue is this: I'm developing a little applet that allows the user to select a directory full of XML files (which we assume have similar formatting but may vary in the exact tag names used). The program then gleans from the contained XML files an array of all tag names which appear in the document. The reason for this is that I then need to use this array to fill up a combo box with options as the user needs to select certain tag names to associate with our set of standardized tag names. For instance, one group of XML documents may use the XML tag name "<producer>" when our group uses "<production_lead>."
I've tried implementing modules such as HTML::Reader (the author of which has been very helpful in attempting to fix my problem but I fear may be a bit too slow for me to meet my deadline), and XML::Parser. The first spits out errors of vague and mysterious origin. The second seems light years more complicated than I need it to be for my application.
Below is a sample of my code from the most successful implementation of a solution I've managed to author using HTML::Reader. Hopefully it will allow you to see the goal I am aiming for, though using HTML::Reader is not a necessity at all for any proposed solutions. In short, I need a simple method of getting a list of XML tag names from an XML document!
my @xmlfiles = ();
opendir(DIR, $self->{dirtree}->GetSelectedPath()) || die "Cannot o
+pen selected path. Make sure a path is selected!";
@xmlfiles = grep(/\.xml$/, readdir(DIR));
closedir(DIR);
my $xmlreader;
my $showerr = 0;
my @taglist = ();
# For every XML file in our list...
for(my $count = 0; $count < @xmlfiles; $count++){
# Create an XML reader for that file, get all the tag data int
+o an array then add only relevant tag data
# to the @taglist array.
# $xmlreader = new HTML::TagReader $self->{dirtree}->GetSelect
+edPath() . "\\" . $xmlfiles[$count];
# my @tagarr = $xmlreader->gettag($showerr);
# for(my $subcount = 0; $subcount < @tagarr; $subcount++){
# push(@taglist, $tagarr[$subcount*3]);
# }
my $infile = $self->{dirtree}->GetSelectedPath() . "\\" . $xml
+files[$count];
my %removedumplicate;
my @tagarr;
my $p=new HTML::TagReader $infile;
while(@tagarr = $p->getbytoken(!my $opt_W)){
my $origtag =$tagarr[0];
if($tagarr[1] eq "" || $tagarr[1] eq "!--"){
next;
}
if ($removedumplicate{$tagarr[0]}){
next;
}
push(@taglist, $tagarr[0]);
$removedumplicate{$tagarr[0]}++;
}
}
In the commented section is my previous implementation of an HTML::Reader solution. Uncommented is the author's suggestion of a possible solution after I contacted him with my problem. Any help is greatly appreciated and will be rewarded with over-the-top praise and adoration.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.