Re: Need help in xml well formedness checker.
by Zaxo (Archbishop) on Feb 21, 2007 at 08:13 UTC
|
The canonical way to do that is to use what you say is unattainable to you. Hit it with XML::Parser and see if it chokes.
Is this job not important enough to get any institutional support?
Any proper XML parser should detect basic XML errors and refuse to proceed. You should be able to trap that error and report. Your problem seems to be obtaining one your bosses will let you have.
| [reply] |
|
|
Thanks for the reply Zaxo,
Actually i dont have permission to install expat library needed by XML::Parser and so as of now i have no other choice other than writing stand alone checker. Parser functionality is not needed. Just well formedness checker.
| [reply] |
|
|
| [reply] |
|
|
XML is not trivial to parse so your options are do a lot of work yourself or get someone else to do a lot of work for you. People have actually done that work and provide it in a variety of modules.
If your boss won't allow you to use code in the form of existing modules why should he allow you to use equivelent code obtained from somewhere else?
Sure, something that validates a trivial subset of XML could be written in reasonable time. But how small a subset is really useful? What can be done with simple code is very trivial indeed!
DWIM is Perl's answer to Gödel
| [reply] |
|
|
I agree with Zaxo---you need to find a way to cheat. The easiest way that I can think of to "cheat" is to use an online "well-formedness" checker like RUWF.
| [reply] |
Re: Need help in xml well formedness checker.
by mirod (Canon) on Feb 21, 2007 at 09:01 UTC
|
What is the exact reason why you can't have expat (or libxml2 for what matters) installed? Is it a matter of not having root access, or did someone with authority tell you you could not install it?
Not having root access shouldn't stop you: just install it as a regular user, in a place where you have the right, and pass the location when you create the MAkefile (I would guess you would run perl Makefile.PL LDFLAGS=-L/my/path_to_lib INC=-I/my/path_to_header, but I have never tried it). I am sure you can also compile xmlwf (or better, xmllint) statically.
| [reply] |
|
|
Sorry monks, I am using hp-ux. i thought not having root access prevents me from expat installation (since it gave error in line "INSTALL_ROOT=$(DESTDIR)" but it seems due to a bug in expat. The link for bug is http://mail.libexpat.org/pipermail/expat-bugs/2006-February/002343.html.
| [reply] |
Re: Need help in xml well formedness checker.
by graff (Chancellor) on Feb 21, 2007 at 09:34 UTC
|
If you are using a windows system where you don't have admin privileges to install compiled code, that's a shame -- I really would expect the sysadmins to recognize the value of making the expat package available to the software development staff, seeing as it's sort of AN INDISPENSABLE INDUSTRY STANDARD for doing any kind of XML-related programming...
OTOH, if you're using any type of macosx/linux/unix system, you can install expat yourself in your own home or working directory, and install XML::whatever perl modules as well, just making sure that you specify your personal path for expat when installing the modules, and include the module path in @INC in your scripts (e.g. using "-I/path/to/your/modules" on the shebang line).
As for rolling your own work-around for well-formedness testing... I gather you've been making some changes to the original code from O'Reilly, but maybe you need to make different kinds of changes -- like adding some sort of "warning" output at each of the points where the "is_well_formed" function returns prematurely, so that you'll get some explanation of why a given XML text failed.
It turns out that the OP code has a problem here:
# match character data
} elsif( $text =~ /(^[^&&>]+)/ ) {
print "char data";
my $data = $1;
# make sure the data is inside an element
return if( $data =~ /\S/ and not( @elements ));
$text = $';
Notice that the "return if(..." statement does a regex match, looking for any non-whitespace character. This causes the "$'" variable to be reset to the string that follows that match. So at the first occurrence of text data inside an element ("self" in this case), $text gets set to whatever follows the first non-whitespace character in $data ("elf" in this case). From that point on, of course, it's a failure.
(update: Forgot to mention: there was also a problem with a lot of the regexes, like the first one in the above snippet, where there was an ampersand when there should have been an open angle bracket; e.g. the one above should have been /(^[^<&>]+)/ )
Apart from that, though, it looks okay. Here's a fixed version, complete with informative warning messages at each of the return points that result from badly formed data, and a final print-out of the end result (okay or bad). To generalize it further, you would replace the string arg in the initial subroutine call with some variable whose content was slurped from STDIN or a file (and you might want to get rid of the initial print statement that I put into the sub).
| [reply] [d/l] [select] |
|
|
Thank you very much graff, It works...
| [reply] |
Re: Need help in xml well formedness checker.
by shmem (Chancellor) on Feb 21, 2007 at 10:25 UTC
|
Just check wehther and xml is well formed or not.
If you don't have too many XML documents to check, you could use LWP and let the W3C XML Schema Validator do the job.
If you are automating the task, check if you're allowed to do so.
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] |
|
|
A great thanks for all monks, i will try Pureperl option and will update ..so that if someone has any similar need that of mine can use it. In mean time i am using graffs modified code which works for me.
Thanks for the ideas of using net versions of xml validator. But the problem is the system doesnt have connection to internet. Thank you all again. You guys have really gievn me lot of solutions within 5 hours of my post. And i am really proud of perl community.
| [reply] |
Re: Need help in xml well formedness checker.
by bart (Canon) on Feb 21, 2007 at 11:45 UTC
|
That /^&/ in all your regexes looks wrong to me. I'm quite certain you ment to match a "<" (almost) every time.
And I'd stay away from the $' variable and friends. Use @+, more particular, $+[0], to get the end of the match, and next, remove it with substr. Or, you could just use s/^$pattern// instead.
p.s. Don't let the other monks get to you, they're trained to yell "use a parser!" as soon as they see a somewhat complicated parsing job. I think you have a correct approach, one that, once debugged, could give correct results. (Caveat: I haven't inspected every detail) After all, the wellformedness check for HTML in user posts on this site, yes, Perlmonks, is somewhat similar in construct.
Update Oh, I see you got it off a website. That's where it got mangled. And you got it off O'Reilly. No wonder it's pretty decent.
Anyway, I've updated the script, fixing the bugs I mentioned, and modernizing it so it uses qr// instead of bare strings and s/^blah// instead of depending on $'. I also added the /o option to the regexes so they get compiled only once.
And guess what: it works.
update: Apparently it can't handle numerical entities, only named entities. I've added support for the former.
| [reply] [d/l] [select] |
|
|
Thanks bart.I have updated my script with this code.It works great.
| [reply] |