Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

This (long-ish) question is mainly about design. I have a simple "database". Basically it is only one table, with about 30 fields and a few hundred records. I would like to extract this data and post to the web via CGI.

One limitation of the server I am using is that I cannot install a real database -- I am stuck with plain-text. That shouldn't be a problem. Parsing this file on a per-request basis will be easy, especially given the low-traffic nature of the site.

What is the problem? Two things, actually. Let me specify first that I am now, and likely will be always be the only maintainer for this site. I set both the file-format and the CGI parsing code. Now, my two concerns:

  1. What format do I use for the text-file?
  2. How do I ensure that meta-characters in the text-file are properly inserted into the HTML?

I had initially thought that XML was the solution, but I have run into problems with both of the above points (I'm not posting code because I want to acertain if I should stick with XML or instead try something else).

Basically, my problems thus far have been: XML is slow, XML introduced bloat, as best as I could see I needed to escape everything to insert it into the XML, and I have *no* desire to start entering records using &xx; format for all my /\"'& and whatever other characters.

So, I am looking for suggestions on how to proceed. Am I struggling with the XML approach because I'm a novice with it, or choosing poor tools (XML::Parser and XML::Writer)? I have a lot of DBI, so I was surprised at having to handle my own escaping. Are there other ways around this?

One final thing: my current alternative is just a text-file with each line containing the field-name followed by value, and a record delimiter like '-' x 50, e.g.:

Field1: DATA Field2: MOREDATA #other fields# ------------------------------- Field1: DATA Field2: MOREDATA #other fields#

This wouldn't allow easy stream-based parsing (e.g. no while (my @row = $dbh->fetchrow_array()) nice-ness), but otherwise seems refreshingly simple compared to XML. Any comments on that?

Replies are listed 'Best First'.
Re: XML "Database" --> HTML
by sri (Vicar) on Aug 31, 2003 at 02:02 UTC
•Re: XML "Database" --> HTML
by merlyn (Sage) on Aug 31, 2003 at 12:13 UTC
    Consider YAML for your data storage. YAML can represent nearly any Perl data structure, and yet is mostly editable by us hewwwmans too, unlike XML which I am beginning to be more disgusted by whenever someone says "the config file is in XML" - yuck.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

Re: XML "Database" --> HTML
by allolex (Curate) on Aug 31, 2003 at 06:57 UTC

    Like mirod said at YAPC::Europe this year, using XML for a database is essentially like using plain text, only slower. I'd say sri's suggestion of DBD::SQLite is the way to go, but you could also parse a tab-delimited (or whatever-delimited) file in the same way, one line per record like my @row = split( /\t/, $_ ) while (<>).

    I hope that helps.

    --
    Allolex

Re: XML "Database" --> HTML
by Hutta (Scribe) on Aug 31, 2003 at 03:46 UTC
    If you're looking for a text-based alternative to XML, I'd recommend CSV. The Text::CSV module handles proper quoting so you don't have to worry about weird characters in your data messing up your flat file's structure. This also gives you one-record-per-line, which would allow "stream-based parsing" like you were looking for.

    However, take sri up on the suggestion to use DBD::SQLite and see if that works out first. Looks like a great fit for your application to me.
Re: XML "Database" --> HTML
by TomDLux (Vicar) on Aug 31, 2003 at 02:24 UTC

    The 'simplest' file record structure is the fixed size field .... think IBM, think COBOL; but it IS wasteful if only a few records approach maximum size.

    The second simplest alternative is delimited fields, such as CSV ( comma separated values ). It is simplest is there is an ASCII value you can guarantee does not appear in a field, only as a delimiter. Even in XML, you have to guarantee the absence of > & <.

    The third solution is to accept that your temporary solution will still be in use 37 years from now, and spec an SQL DB for your data storage.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: XML "Database" --> HTML
by CombatSquirrel (Hermit) on Aug 31, 2003 at 01:43 UTC
    XML::Parser is pretty much standard. If you want to use plain text files, be sure that your data will never include unpriteable characters and such. Also, your delimiter should not be a possible data charachter. If you want to choose the (long-term) probably better alternative, go for XML.
    Oh, and never assume that you will be the only person ever using or maintaining your code ;-).
    Hope this helped.
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
Re: XML "Database" --> HTML
by drfrog (Deacon) on Sep 01, 2003 at 00:27 UTC
    ive done this before using a database with XML::LibXML,
    not much difference using a plain text

    I'd love to see something like this in perl :

    http://technet.oracle.com/tech/xml/xdk_java/content.html

    then something like this could happen:
    <?xml version="1.0" ?> <XSQLConfig> : <connectiondefs> <connection name="demo"> <username>scott</username> <password>tiger</password> <dburl>jdbc:oracle:thin:@localhost:1521:ORCL</dburl> <driver>oracle.jdbc.driver.OracleDriver</driver> </connection> <connection name="lite"> <username>system</username> <password>manager</password> <dburl>jdbc:Polite:POlite</dburl> <driver>oracle.lite.poljdbc.POLJDBCDriver</driver> </connection> </connectiondefs> : </XSQLConfig>
Re: XML "Database" --> HTML
by Cody Pendant (Prior) on Sep 01, 2003 at 04:08 UTC
    One limitation of the server I am using is that I cannot install a real database -- I am stuck with plain-text.

    This is a situation where you can't use even DBM, with the dbmopen() command?

    I use that as a quick and dirty database on practically a daily basis.



    ($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print