Re: quickest way to parse pidgeon XML?

Originally this data was in CSV format

Why did it change? If you have a flat data-structure then CSV is obviously a better choice than XML. It's more compact, probably faster to parse and probably faster to generate. I'd say go back to CSV and check out the excellent Text::CSV_XS.

-sam

Comment on Re: quickest way to parse pidgeon XML?

Replies are listed 'Best First'.
Re: Re: quickest way to parse pidgeon XML? by amazotron (Novice) on May 31, 2004 at 21:23 UTC
I appreciate all the sentiments. The large data sets are still CSV-based, but I have need to start storing more information such that the data is no longer flat. XML seems like an ideal mechanism (or possible a set of datbase tables), but I'm very concerned with speed. The current implementation is not the speediest and any real slowdown is going to be noticed. I control both the reading and writing of the files, and I thought it would be ideal to use a subset of XML (for speed). Allan	[reply]
Re: Re: Re: quickest way to parse pidgeon XML? by samtregar (Abbot) on Jun 01, 2004 at 03:28 UTC
You need to drop XML like a hot rock if speed is a primary concern and you control both sides of the transaction. XML isn't optimized for speed, it's optimized for readability and extensibility. Even if you cheat by writing your own regexes you'll still have to contend with the overhead added by all the "<foo></foo>" repetition. Have you considered Storable? As far as serializing Perl data-structures goes it's the undisputed speed king. Depending on your access pattern you might also consider DB_File. When used correctly it can be quite fast. -sam	[reply]
Re: Re: Re: Re: quickest way to parse pidgeon XML? by amazotron (Novice) on Jun 01, 2004 at 14:24 UTC
Thanks for the pointer, Sam. I will check Storable out, in particular. Allan	[reply]