comment on

Greetings fellow monks, accolytes, etc. I have an XML-related question for the Monks today. I'm trying to debug a module that is parsing an XML file and creating a text file (called an imp file here). The function is to take documents in the ThML format from ccel.org and process them to be used in the Sword application (www.crosswire.org/sword). The imp is for importing the file into the format for Sword. It contains each section of the document in one line. Here's the LD:

When you parse a valid XML file, it works fine under most circumstances. However, the problem is with the <scripRef> tag. When you have several in sequence, you end up loosing the closing tag on all of the instances. For instance, you might have a string in the XML file, like this:

<scripRef...>1 John 1:1</scripRef>, <scripRef...>John 3:16</scripRef>,
+ ...
[download]

When it's processed to the imp file, you end up with something like this:

<scripRef...>1 John 1:1, <scripRef...>John 3:16, ...
[download]

which breaks things.

So, I've looked at the script and the module used here. The code appears to be in the module, but I can't figure out why it's missing things. So, below find the code. If anyone has any ideas on how I could repair this, it would be great. If you need more code, let me know.

NB - I am not the creator or maintainer. I am using this largely for personal stuff.

sub parseStart {
    my $expat = shift;
    my $tag = shift;
    my %attr = @_;

   SWITCH:
    for ($tag) {
          /^DC.(.*)$/          && do { saveDC($1); last SWITCH; };
        /div(\d+)/           && do { start_section($1, $attr{title}); 
+last SWITCH; };
        /^(p|h\d+)$/         && do { passthrough_start($1); last SWITC
+H; };
        /^(verse)$/          && do { passthrough_start('p'); last SWIT
+CH; };
        /^(span)$/           && do { passthrough_start('b'); last SWIT
+CH; };
        /^(l)$/              && do { $sectionData{$currentDepth} .= '&
+nbsp;&nbsp;'; last SWITCH; };
        /^(scripRef)$/       && do { $sectionData{$currentDepth} .= "<
+scripRef passage=\"$attr{passage}\">"; last SWITCH; };
        /^(note|added)$/           && do { ignore(); last SWITCH; };
    }
}

sub parseEnd {
    my ($expat, $tag) = @_;

   SWITCH:
    for ($tag) {
          /^DC.(.*)$/          && do { end_saveDC($1); last SWITCH; };
        /div(\d+)/           && do { end_section($1); last SWITCH; };
        /^(p|h\d+|scripRef)$/      && do { passthrough_end($1); last S
+WITCH; };
        /^(verse)$/                && do { passthrough_end('p'); last 
+SWITCH; };
        /^(span)$/                 && do { passthrough_end('b'); last 
+SWITCH; };
        /^(br|l)$/                 && do { $sectionData{$currentDepth}
+ .= "<br />"; last SWITCH; };
        /^(note|added)$/           && do { unignore(); last SWITCH; };
    }
}
[download]

What is happening here is that these two seperate subs are gathering the opening tags and stripping out some un-needed info, then finding the closing tag. I can't tell where to start here. Thanks,

Monger

Monger +++++++++++++++++++++++++ Munging Perl on the side

In reply to Script Misses Close Closing Tags by monger

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.