draper7 has asked for the wisdom of the Perl Monks concerning the following question:

  Fellow monks I come in seek of knowledge, again. Well everything was great until my Online Banking Website switched from QIF (Quicken) account information downloads to OFX / SGML files. Why can't people leave things well enough alone. I've searched for a editor/viewer for OFX / SGML files but haven't had any luck so I thought that I might be able to write a small script to get the needed information out of the file.
  Anyway my delima is that I don't know if I should use a module like XML::Simple or just throw together something else. Or maybe someone knows of a existing script I could use. Below is a sample of what the file looks like. Thanks in advance!
OFXHEADER:100 DATA:OFXSGML VERSION:102 SECURITY:NONE ENCODING:USASCII CHARSET:1252 COMPRESSION:NONE OLDFILEUID:NONE NEWFILEUID:NONE <OFX> <SIGNONMSGSRSV1> <SONRS> <readmore> <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <DTSERVER>########### <LANGUAGE>ENG <FI> <ORG>S1 <FID>######## </FI> <INTU.BID>### </SONRS> </SIGNONMSGSRSV1> <BANKMSGSRSV1> <STMTTRNRS> <TRNUID>0 <STATUS> <CODE>0 <SEVERITY>INFO </STATUS> <STMTRS> <CURDEF>USD <BANKACCTFROM> <BANKID>######## <ACCTID>DDA##-######### <ACCTTYPE>CHECKING </BANKACCTFROM> <BANKTRANLIST> <DTSTART>########## <DTEND>########### <STMTTRN> <TRNTYPE>POS <DTPOSTED>########### <TRNAMT>-106.00 <FITID>############# <NAME>H &amp; R Block </STMTTRN> <STMTTRN> <TRNTYPE>POS <DTPOSTED>########## <TRNAMT>-3.19 <FITID>######### <NAME>Chevron </STMTTRN> </BANKTRANLIST> <LEDGERBAL> <BALAMT>##.## <DTASOF>############## </LEDGERBAL> </STMTRS> </STMTTRNRS> </BANKMSGSRSV1> </OFX>
   --JD

Replies are listed 'Best First'.
Re: OFX/SGML Parse
by mirod (Canon) on Mar 16, 2002 at 05:33 UTC

    I would try converting the SGML part of the message in XML so you can use XML tools on it.

    Converting SGML to XML can be very easy... or a real PITA, depending on the usage of SGML-specific features used in the SGML. And I am not even talking obscure stuff, but just things like &amp foo which is valid SGML (the space after the entity allows an SGML parser to infer the missing ';').

    So here are your options:

    • the easy and dangerous one: use regexps to add the missing tags at the end of lines: easy because you can pobably hack something that works in a couple of minutes, dangerous because there is 0 garantee that the output is indeed XML, and if the bank starts using different (legal) features of SGML (ah the guilty pleasure of using a CONREF!) you'll be stuck,
    • The easy and right way: use James Clarke's sx which oddly enough does SGML to XML conversion, and considering James'nsgml is the reference parser for SGML and that he wrote half of the XML spec I would think it does a good job,
    • the hard but right (and hubristic!) way: write a SAX parser for SGML, using a wrapper around the SP library, so you can then use any XML tool that supports a SAX input, such as the last version of XML::Simple.
Re: OFX/SGML Parse
by erikharrison (Deacon) on Mar 16, 2002 at 02:50 UTC
    Well, I don't think an XML parser is going to do what you want - SGML can do things that XML can't (XML being deigned as a subset of SGML). There are several modules on the CPAN dealing with SGML - but I've not used them and none of them seems to be clearly what you need. If the SGML used turns out to match the subset which is XML then try it out - and be careful. Otherwise you could try to pull something of your own together - but it would take a while.

    Good Luck!

    Cheers,
    Erik