Greetings fellow monks, accolytes, etc. I have an XML-related question for the Monks today. I'm trying to debug a module that is parsing an XML file and creating a text file (called an imp file here). The function is to take documents in the ThML format from ccel.org and process them to be used in the Sword application (www.crosswire.org/sword). The imp is for importing the file into the format for Sword. It contains each section of the document in one line. Here's the LD:

When you parse a valid XML file, it works fine under most circumstances. However, the problem is with the <scripRef> tag. When you have several in sequence, you end up loosing the closing tag on all of the instances. For instance, you might have a string in the XML file, like this:

<scripRef...>1 John 1:1</scripRef>, <scripRef...>John 3:16</scripRef>, + ...

When it's processed to the imp file, you end up with something like this:

<scripRef...>1 John 1:1, <scripRef...>John 3:16, ...

which breaks things.

So, I've looked at the script and the module used here. The code appears to be in the module, but I can't figure out why it's missing things. So, below find the code. If anyone has any ideas on how I could repair this, it would be great. If you need more code, let me know.

NB - I am not the creator or maintainer. I am using this largely for personal stuff.

sub parseStart { my $expat = shift; my $tag = shift; my %attr = @_; SWITCH: for ($tag) { /^DC.(.*)$/ && do { saveDC($1); last SWITCH; }; /div(\d+)/ && do { start_section($1, $attr{title}); +last SWITCH; }; /^(p|h\d+)$/ && do { passthrough_start($1); last SWITC +H; }; /^(verse)$/ && do { passthrough_start('p'); last SWIT +CH; }; /^(span)$/ && do { passthrough_start('b'); last SWIT +CH; }; /^(l)$/ && do { $sectionData{$currentDepth} .= '& +nbsp;&nbsp;'; last SWITCH; }; /^(scripRef)$/ && do { $sectionData{$currentDepth} .= "< +scripRef passage=\"$attr{passage}\">"; last SWITCH; }; /^(note|added)$/ && do { ignore(); last SWITCH; }; } } sub parseEnd { my ($expat, $tag) = @_; SWITCH: for ($tag) { /^DC.(.*)$/ && do { end_saveDC($1); last SWITCH; }; /div(\d+)/ && do { end_section($1); last SWITCH; }; /^(p|h\d+|scripRef)$/ && do { passthrough_end($1); last S +WITCH; }; /^(verse)$/ && do { passthrough_end('p'); last +SWITCH; }; /^(span)$/ && do { passthrough_end('b'); last +SWITCH; }; /^(br|l)$/ && do { $sectionData{$currentDepth} + .= "<br />"; last SWITCH; }; /^(note|added)$/ && do { unignore(); last SWITCH; }; } }

What is happening here is that these two seperate subs are gathering the opening tags and stripping out some un-needed info, then finding the closing tag. I can't tell where to start here. Thanks,

Monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

In reply to Script Misses Close Closing Tags by monger

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.