Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Perl with XML
by mirod (Canon) on Oct 25, 2001 at 19:47 UTC | |
If this is a homework then I would like to -- both you (for not running the code under the debugger, using Data::Dumper or generally searching this site and the web for information on Perl and XML) and your teacher. XML SHOULD NOT BE PARSED USING REGEXPs (unless you seriously know what you're doing, as Paul Kulchenko in XML::Parser::Lite or Matt Sergeant in his new pure Perl parser). See On XML parsing for a few reasons (and I am not even mentioning non-ascii encodings in that post). Now if you are looking for resources on Perl and XML you can have a look at xml.com, which carries a series of really good articles by Kip Hampton on Perl and XML, I have a couple of resources on xmltwig.com and the Mother of all XML resources is of course The XML Cover Pages, which even has a section on Perl and XML. Update: Oh my! I forgot to mention a web site dedicated to Perl and XML: xmlperl.com! | [reply] |
|
Re: Perl with XML
by tomhukins (Curate) on Oct 25, 2001 at 18:18 UTC | |
Is there any reason you're using your own code rather than standard CPAN XML modules? XML::Parser is good for creating flexible XML parsing code, and XML::Simple is great if you want to convert XML into Perl data structures. To answer your questions: 1. It's always a good idea to pass data rather than referring to it directly to reduce the number of global variables. This makes your code more scalable. 2. You're missing a / from the closing divriskgrade tag. Why have you offered a hint, though? Can you already answer this question? If so, why ask it? If you were to use standard XML modules, they would report where the error occurs. 3. 4. & 5. Why can't you answer these questions yourself? Run the code and find out! For question 5, though, you'd be much better off using CPAN modules such as HTML::Parser instead of writing your own HTML parsing code which is liable to failure. As for good Web sites that discuss Perl and XML, search Google for perl xml or learn how to find information on Perl Monks. Super Search is very useful. Tutorials and Module Reviews contain information that will help you with XML parsing. If you're not sure why writing your own parsing code is a bad idea, take a look at Re: Parsing HTML and (tye)Re: parsing HTML. Update: We've been discussing this thread on the CB, and several monks let me know they have downvoted this post because it answers a homework question. I've considered editing my response to question 2, but the questioner has probably read it by now, and the response might help someone else. Overall, I think my answers were vague enough to make the questioner think, and might be useful to others. | [reply] [d/l] [select] |
|
Re: Perl with XML
by MZSanford (Curate) on Oct 25, 2001 at 18:14 UTC | |
i had a memory leak once, and it ruined my favorite shirt. | [reply] |
|
Re: Perl with XML
by perrin (Chancellor) on Oct 25, 2001 at 21:27 UTC | |
You should keep in mind that at least one former RiskMetrics employee is regularly on this site: japhy. | [reply] |
by Anonymous Monk on Dec 06, 2001 at 06:58 UTC | |
| [reply] |
|
Re: Perl with XML
by buckaduck (Chaplain) on Oct 25, 2001 at 18:53 UTC | |
If you fix this, you might be able to run the program and answer the teacher's questions yourself. buckaduck | [reply] [d/l] [select] |
|
Re: Perl with XML
by tachyon (Chancellor) on Oct 25, 2001 at 18:34 UTC | |
This looks like homework to me. Did you consider doing this?
As for Q5 this is a bizare regex that does this:
In a nutshell this will subsitite all HTML tags with the letter 'x' except for <b> </b> <i> </i> tags. It is very broken.
When all else fails suck it and see! cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print | [reply] [d/l] [select] |
|
Re: Perl with XML
by tachyon (Chancellor) on Oct 25, 2001 at 19:02 UTC | |
Here is how to use HTML::Parser to do the job (your) regex doesn't:
Before you lay that on your teacher make sure you understand how the hash slice lookup table and the V2 interface to HTML::Parser works :-) cheers tachyon s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print | [reply] [d/l] |
|
Re: Perl with XML
by cbeels (Initiate) on Oct 26, 2001 at 13:25 UTC | |
| [reply] |
|
Re: Perl with XML
by cbeels (Initiate) on Oct 26, 2001 at 13:43 UTC | |
The point about use strict is valid and has been corrected in the quiz. As for the "broken" regexp that Tachyon described most eloquently above, it was actually used to translate a bunch of files that had been raped by DreamWeaver. The only valid tags were the <b> and <i> tags (which were all lower case), but I needed to keep track of where the other ones were, so I used "x"s. Not terribly elegant, but made for a good question (that most people get wrong). | [reply] |
by mirod (Canon) on Oct 26, 2001 at 16:31 UTC | |
No! Your code does NOT parse XML. It parses a limited subset of XML. It might be OK for the data you handle right now but it means that you cannot change this data. Are the restrictions you put on the XML clearly documented somewhere? Because if you have to receive data from a source that you don't control and if you just tell them "it's XML, here is the DTD/schema" I can tell you that you open the door to tons of problems. People do use entities, comments, processing instructions, namespaces and the likes! And as this regexp based parsing does no validation whatsoever of the incoming XML, how do you know you can trust it? In short you are using an internal format, that looks a little bit like XML but that is not XML. This is fine except when you call it XML. I understand that the quizz is for applicants to your company only, so it's not like you were advocating your method in a public forum, but I still want to warn people (and you!) against thinking that XML is simple to process using regexps. BTW if you don't want to use XML::Parser you can also use XML::Parser::Lite, which is regexp based, or libXML, or soon the new XML::SAX::PurePerl or you could use a real (and fast) XML processor to generate a version of the data that you know you can handle (expanding entities, discarding comments...) | [reply] |
by buckaduck (Chaplain) on Nov 01, 2001 at 01:19 UTC | |
I hope that RiskGrades counts this in my favor if I should decide to apply! (And I just might do that; I'll be looking for a new job next year when my worksite closes...) buckaduck | [reply] |