japhy has asked for the wisdom of the Perl Monks concerning the following question:
That's pretty much what my XML document looks like. I'm trying to create a web-based search tool for this document. You can search for specific words in TEXT or QUOTE, in PARA, or in SCENE. When the results come back, each XML tag becomes an HTML tag; <SCENE> becomes <SPAN CLASS="SCENE">, and so on.<BOOK> <CHAPTER> <SCENE> <PARA> <TEXT>The man said,</TEXT> <QUOTE>"Hello."</QUOTE> </PARA> <PARA> <TEXT>I was startled. I didn't know there was a man there. </TEXT> <QUOTE>"Hello,"</QUOTE> <TEXT>I said back to him.</TEXT> </PARA> </SCENE> <SCENE> <PARA>...</PARA> <PARA>...</PARA> <PARA>...</PARA> </SCENE> ... </CHAPTER> <CHAPTER> ... </CHAPTER> </BOOK>
The problem I have is the filtering. If you search on the "SCENE" scope, if any text is found in a SCENE tag that matches the words you're looking for, the entire SCENE tag is displayed. If you're searching on the "PARA" scope, only those paragraphs in a scene that match get displayed, but the surrounding SCENE ... /SCENE tags need to get displayed. And if you're searching on the "QUOTE" scope, only those QUOTE tags that match should get printed, but each in its proper PARA group, and each set of PARAs in its proper SCENE.
The problem I guess is how to do this efficiently. I don't want to display AT ALL any empty SCENEs or PARAs (that is, any SCENEs or PARAs that don't have any matching elements). I tried doing it on the fly, streaming the XML output from XML::Parser, but it's more difficult than I imagined.
If you want to see sample output, here goes. If I was searching on the "PARA" scope for "the", I'd expect back:
If I was searching on the "QUOTE" scope for "hello", I'd expect:<BOOK> <CHAPTER> <SCENE> <PARA> <TEXT>The man said,</TEXT> <QUOTE>"Hello."</QUOTE> </PARA> </SCENE> </CHAPTER> ...any other matches... </BOOK>
If I was searching on the "SCENE" scope for "apple", I'd expect:<BOOK> <CHAPTER> <SCENE> <PARA> <QUOTE>"Hello."</QUOTE> </PARA> <PARA> <QUOTE>"Hello,"</QUOTE> </PARA> </SCENE> </CHAPTER> ...any other matches... </BOOK>
So there's my problem. I'd really like to be able to do it on-the-fly, instead of building up results and then filtering them. If I have to, I'll buffer it to an entire scene's contents (meaning, after I've parsed an entire SCENE element, I'll display the contents if there are any matches), which I have a feeling is what I'll end up having to do.<BOOK> ...any other matches... </BOOK>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Difficult XML presentation issue
by diotalevi (Canon) on Feb 03, 2004 at 05:38 UTC | |
|
Re: Difficult XML presentation issue
by stvn (Monsignor) on Feb 03, 2004 at 02:58 UTC | |
by inman (Curate) on Feb 03, 2004 at 13:56 UTC | |
by stvn (Monsignor) on Feb 03, 2004 at 15:07 UTC | |
|
Re: Difficult XML presentation issue
by Fletch (Bishop) on Feb 03, 2004 at 01:58 UTC | |
| A reply falls below the community's threshold of quality. You may see it by logging in. |