in reply to Converting text to XML; Millions of records.
(Note: I realize that you may have no control over the project requirements. I also realize that the example you gave may be a toy example. But I feel I must comment anyway...)
As I see it, XML is bloated and ugly. However, it's useful because it allows you to make your data descriptive and easier parse and use in new ways. So I suggest that you change your schema, if possible. I don't really see how
<datafield tag="702" ind1="" ind2=""> <subfield code="a">Thomson, Bryden</subfield> <subfield code="b">1928-1991</subfield> <subfield code="c">Conductor</subfield> </datafield>
is any more descriptive than the original file. I feel you would be better served giving descriptive tags to your data. Perhaps something like:
<conductor> <Name> <Last>Thomson</Last> <First>Bryden</First> </Name> <Born>1928</Born> <Died>1991</Died> </conductor>
In my job, I *frequently* have to reverse engineer file formats, and I would greatly prefer to reverse engineer the first file format than the XML version, unless the tags were meaningful. Without meaningful field names, it just makes detecting meaningful patterns in the data more difficult with the visual clutter.
Just my $0.02.
...roboticus
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Converting text to XML; Millions of records.
by superfrink (Curate) on Jul 07, 2009 at 18:35 UTC |