Populating a complex SQL database from a flat file

tomhukins has asked for the wisdom of the Perl Monks concerning the following question:

I've got some data that are supplied to me in flat file format. I'd like to use them to populate several related fields in a database.

The data are roughly of the form:

TopicA/TopicB Title1 Description1 KeywordA/KeywordB/KeywordC
TopicB Title2 Description2 KeywordB/KeywordD
TopicA/TopicC Title3 Description3 KeywordA/KeywordC

Each item has a title and a description. Each item is associated with one or more topics (separated by a '/') and one or more keywords (also separated by a '/'). Fields are tab separated.

I'm using a Microsoft SQL Server 7 database consisting of tables representing:

Resources (title, description and an automatically generated unique ID)
Topics (maps resources to topics)
Keywords (maps resources to keywords)

I've written a quick script that parses the flat file contents to produce a Perl data structure using Text::CSV_XS to split the tab-delimited data into fields, then calling perlfunc:split with the '/' delimeter to create an array of topics and keywords.

To populate the database, I could use a series of SQL statements with DBI but I'd have to lookup ID fields for the resources after creating them. I'm wondering if there's a more elegant, succinct way to do this, as it's likely I'll have to do this in the future with different data structures.

DBIx::Renderer looks like it might help me, but at present the only database supported is PostgreSQL. Alzabo and Tangram both seem to be overkill for a small problem like this - my first impression is that they are more powerful and complex than what I want.

My question is this: Have you solved a similar problem and how did you go about solving it? Also, if there's anything I've missed or if any of my assumptions above seem wrong, let me know.

Comment on Populating a complex SQL database from a flat file

Replies are listed 'Best First'.
Re: Populating a complex SQL database from a flat file by VSarkiss (Monsignor) on Apr 25, 2002 at 16:13 UTC
Yes, this is a very common problem when doing system conversions. There's no real magic way to do it: once you have a record in memory, you have to populate several tables with it, which involves looking up the ID and inserting the row. With SQL Server, you can use stored procedures to handle the multiple lookups and inserts. If your database has RI enabled, you have to be careful what order you populate things in. If the entire flat file will fit into memory, you can make sure all your reference data is correct before you start inserting the main data. It's a small simplification, but it can help. In other words, use Perl to break up the flat file, then use SQL to populate the target tables. Since you're only populating three tables, it doesn't sound too complicated. With SS7, you can use DTS to control the process, although a single pass through the Perl script may be sufficient. With something this size, you have to balance the effort of customizing a package versus writing custom code. It may just be a wash. If you run into trouble, my consulting rates are very reasonable. ;-)	[reply]
Re: Populating a complex SQL database from a flat file by perlplexer (Hermit) on Apr 25, 2002 at 16:12 UTC
To populate the database, I could use a series of SQL statements with DBI but I'd have to lookup ID fields for the resources after creating them. If you are using sequences to generate your primary keys then you could get a new sequence number first and then do your insert using that number; e.g., # ... my $sth = $dbh->prepare('INSERT INTO foo (pkey, bar, baz) VALUES (?, ? +, ?)') or die "Error: #DBI::errstr"; # assuming 'bar' and 'baz' are in @data, which is a 2D array my @ids = (); # you will have all primary keys here for (@data){ my $id = getID($dbh, 'foo_sequence'); redo unless $id; # or just die() instead $sth->execute($id, @$_); push @ids, $id; } # ... sub getID{ my ($dbh, $seqID) = @_; # this is for Oracle, you may need to tweak it for MS SQL my $sth = $dbh->prepare_cached(qq/SELECT $seqID.NEXTVAL FROM dual/) or return undef; $sth->execute() or return undef; my @row = $sth->fetchrow_array(); return @row ? $row[0] : undef; } [download] This approach is faster than getting the primary key after you create a new row. Hope this helps. --perlplexer	[reply] [d/l]
Re: Populating a complex SQL database from a flat file by johannz (Hermit) on Apr 25, 2002 at 18:58 UTC
I don't have my SQL Server books with me, but you probably want to look at the @@IDENTITY value. You can look on Google for more info about this value, but it would go something like this: `my $insert = $dbi->prepare('insert into table values( ?, ?, ?)'); my $ident_sth = $dbi->prepare('select @@identity'); $insert->execute($val1, $val2, $val3); $ident_sth->execute(); my $identity = $ident_sth->fetchrow_array();` [download] Update:This is faster than querying the table itself to see what the id was. Instead of an index scan, this is a variable lookup.	[reply] [d/l]
Re: Populating a complex SQL database from a flat file by silent11 (Vicar) on Apr 25, 2002 at 15:49 UTC
MySQL can read in 'x' delimited files via something like this: `Load data infile csv.txt into TABLE` MS SQL may have a similar utility. Good luck . -Silent11	[reply] [d/l]
Re: Re: Populating a complex SQL database from a flat file by tomhukins (Curate) on Apr 25, 2002 at 15:54 UTC
It's almost a year since I used MySQL, and you're right that it can populate a table with the contents of a flat file, but when I used it it wasn't able to populate multiple tables with complex data structures from one file. As I mentioned in my original post, the file I'm reading is not only tab delimited: some fields also contain multiple values delimited with the '/' character.	[reply]