Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

best way to change xml record using XML::Simple?

by waxmop (Beadle)
on Feb 24, 2003 at 18:31 UTC ( [id://238192]=perlquestion: print w/replies, xml ) Need Help??

waxmop has asked for the wisdom of the Perl Monks concerning the following question:

Greetings wise ones:

I just wrote a clunky application, and I want to get the community's feedback, because I don't like how I wrote it.

Background: the people I work for need a way to search, add, and edit their records of contacts at a bunch of different companies. For example, when somebody gets a new phone number, we need to be able to update our records.

Here's the overview of what I did:

  • Turned the excel spreadsheet that held all the orders into an xml file.
  • Wrote a search.html page that gets information from a form, and then uses perl-mason and XML::Simple to find matching records and print them.
  • For any printed record, a new form appears that allows the user to fill out updated information. Then, the perl-mason code replaces the old record with the new one. This is the part that seems clunky to me.

The xml file looks sort of like this:

<records> <record> <uid>0001</uid> <company_name>Acme Industries</company_name> <contact_name>Arthur Dent</contact_name> <contact_phone_number>867-5309</contact_phone_number> </record> <record> <uid>0002</uid> <company_name>Zeta Industries</company_name> <contact_name>Sam Lowry</contact_name> <contact_phone_number>555-5555</contact_phone_number> </record> </records>
I wrote some perl-mason pages to allow users to search records after parsing the data with XML::Simple. They can edit a record by filling out a form on the post_edits.html page. Here's what happens on the post_edits.html page:
<%init> use XML::Simple; my $xref = XMLin("path_to_xmlfile.xml"); my $newnode = {}; foreach my $kee ( "uid", "company_name", "contact_name", "contact_phone_number" ) { $newnode->{$kee} = $ARGS{$kee}; } foreach my $rec ( @{$xref->{record}} ) { if ( $rec->{'uid'} eq $newnode->{'uid'} ) { $rec = $newnode; } } </%init>

I pass the new edited record in the %ARGS hash. Then I create the $newnode structure which will replace the old record. Then I loop through my in-memory structure until I find the record that I want to replace, and then I replace it, with the $rec = $newnode line. I've used Data::Dumper to check that it works, and it looks good, but I can't help but wonder if there is a better way.

All comments and criticism are welcome. Let the didactic abuse flow!

update (broquaint): added missing </ul>

Replies are listed 'Best First'.
Re: best way to change xml record using XML::Simple?
by RiotTown (Scribe) on Feb 24, 2003 at 18:48 UTC
    Why not use a simple database? The schema is already defined within the excel spreadsheet, and you'd get away from data integrity errors (your solution is not multi-user safe). And you would still be able to create the spreadsheet whenever necessary using something like Spreadsheet::WriteExcel. You already seem to be doing much of the work already; using a db would allow SQL to more efficiently update just the row that needed to be instead of chugging through the entire XML file each and every time.

      I agree that this a DB problem. However, that is not possible right now.

Re: best way to change xml record using XML::Simple?
by CountZero (Bishop) on Feb 24, 2003 at 19:40 UTC

    XML is of course very modern, hip and funky to boot, but I wonder if it is not a bit of an overkill here.

    As your data seems to be very regular, I would go either for a simple CSV-file (easy to extract from an excel-file) and DBD::CSV or go direct to the excel-file itself (if it is in an acceptable format, i.e. worksheet = TABLE and the first row contains the columnheadings) and use DBD::Excel.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      While I agree with some of your thoughts, I personally try to avoid CSV files because the definition of "CSV" is so elastic in practice. How do you quote or escape field containing commas and quotes? How do you handle trailing empty fields? I've seen a lot of variation in stuff that is ostensibly "CSV", and that always makes me nervous from a long-term maintainability and interoperability standpoint.

      That said, I do agree delimited files can make a lot of sense for lightweight data storage. For information that normally wouldn't contain internal whitespace other than "regular" (ASCII 32) spaces -- and this application might qualify -- I often choose tab-delimited.

      One potential benefit I do see to using XML in this application is that you can easily store data that isn't quite so regular. For example, if you wanted to support multiple contact phone numbers, it would be fairly easy to expand the data structure like this:

      <record> <!--existing stuff--> <contact_phone_number note="business">123-4567</contact_phone_number> <contact_phone_number note="pager">555-6789</contact_phone_number> <contact_phone_number note="vacation home in Bermuda">+1-99-20-55-6789 +</contact_phone_number> </record>

      Dealing with irregularities like that would be a bit more work in a rigid db-like or non-hierarchical file format.

              $perlmonks{seattlejohn} = 'John Clyman';

        I entirely agree with your comments and that is why DBD::CSV has all the usual parameters for setting the "elastic" properties of CSV.

        Of course XML cannot be beaten for storage of irregular records. Anyone would be hard pressed to equal such a flexible system for irregular data and still maintain a sense or order.

        CountZero

        "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        There's an anti-buzzword backlash against XML floating around, which is a natural reaction to something that's been hyped so much by marketers.

        But after all the smoke clears, it's pretty hard to beat the simplicity of using XML::Simple's XMLin() and XMLout() functions.

        CSV files combined with while (<IN>) {...} logic has old-school appeal, but in the end, it's irritating and tedious. There's so many exceptions to be handled, like data that splits over lines, or data that contains the delimiter as part of the data, etc, and it's not as flexible as XML when you have to add new variables.

Re: best way to change xml record using XML::Simple?
by seattlejohn (Deacon) on Feb 24, 2003 at 20:21 UTC
    As others have mentioned, XML might not be the most straightforward or efficient solution to this problem -- but since you've already got working code here's a thought as to how you could evolve it. Check out some of the options in XML::Simple that control the way XMLin folds the generated data structure. Using keyattr, you ought to be able to set things up so that the data structure ends up with UID as a hash key. That would let you remove the loop that searches for the matching UID, something simple like this (untested, and dependent on your exact data structure):
    $xref->{record}->{$newnode->{uid}} = $newnode;

    Also, I would tend to avoid the four-digit zero-padded UIDs in favor of simple integers, which are less work to generate and also don't cause potential problems when you add your 10,000th contact -- though I hope that if you do have 10,000 contacts, you'd have begun using a real database rather than an XML file at that point :-)

            $perlmonks{seattlejohn} = 'John Clyman';

      your comment was exactly what I was hoping for. I'll play with the keyattr to avoid the loop.

Re: best way to change xml record using XML::Simple?
by BrowserUk (Patriarch) on Feb 24, 2003 at 21:24 UTC

    If you play with the attributes on XMLin() and XMLout() your can avoid needing to search by forcing it to use hashes instead of arrays.

    Using keeproot=>1, keyattr=>'uid' on both the XMLin() and XMLout() calls and adding noattr=>1 on the XMLout() you can pursuade XML::Simple to write the data back in the same form as it received it.

    If you can't use a DB of some form, and you expect to have multiple concurrent users, then you could move the access of the data into a seperate process, maintaining a copy in memory and have it serve details to, and receive updates from the CGI process via a socket or pipe. This would probably need to be multi-threaded/forked but you might get away with having the CGI try to connect and then backoff for a short period and retry if it doesn't get access first time, if the volumes of traffic are low.

    the output


    ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
      How does the __DATA__ block work? Is that some sort of handle? Any links to explanatory would be much appreciated.

        From perldata:

        The two control characters ^D and ^Z, and the tokens __END__ and __DATA__ may be used to indicate the logical end of the script before the actual end of file. Any following text is ignored.

        Text after __DATA__ but may be read via the filehandle "PACKNAME::DATA", where "PACKNAME" is the package that was current when the __DATA__ token was encountered. The filehandle is left open pointing to the contents after __DATA__. It is the program's responsibility to "close DATA" when it is done reading from it. For compatibility with older scripts written before __DATA__ was introduced, __END__ behaves like __DATA__ in the toplevel script (but not in files loaded with "require" or "do") and leaves the remaining contents of the file accessible via "main::DATA".

        See SelfLoader for more description of __DATA__, and an example of its use. Note that you cannot read from the DATA filehandle in a BEGIN block: the BEGIN block is executed as soon as it is seen (during compilation), at which point the corresponding __DATA__ (or __END__) token has not yet been seen.

        After Compline,
        Zaxo

        I can't find a reference to __DATA__ in the docs. I know its there somehere. Maybe someone else will post one.

        Essentially, you can use <DATA> as a file handle that you don't need to open to access anything after the __DATA__ marker at the end of your source file, __END__ works too, but has caveats when used with modules I was informed recently.

        It's very useful for testing and demo purposes. You can even have multiple embedded and even writable files using Damian Conway's Inline::Files, I don't think they would be useful for your purposes here though.


        ..and remember there are a lot of things monks are supposed to be but lazy is not one of them

        Examine what is said, not who speaks.
        1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
        2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
        3) Any sufficiently advanced technology is indistinguishable from magic.
        Arthur C. Clarke.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://238192]
Approved by broquaint
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-19 06:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found