M15U has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!!! I have a question for you.

I'm using DBI module in perl and I have a problem that I cannot wrap my mind around it. Lets say I have this table called 'article', which I fill with some data collected from different files, the code looks like this :

my $id_article = 0; for ($i_article = 0; $i_article < @output_concord_files_prepare; $ +i_article++){ $dbh->do(" INSERT INTO `article`(`id_article`, `url`, `html_extr_text +`,`concord_file`, `sys_time`) VALUES ('$id_article', '$url_prepare[$i_article]', '$html_ +pages_files_extended[$i_article]', '$output_concord_files_prepare[$i_ +article]', '$sys_time_prepare[$i_article]') ") || die $dbh->errstr; } $id_article++;

The code works. Each array contains strings of characters which are inserted in the 'article' table.

Now I have another table called 'event' :

my $id_event = 0; for ($i_event = 0; $i_event < @event_prepare; $i_event++){ $dbh->do(" INSERT INTO `event`(`id_event`, `event`) VALUES ('$id_event', '$event_prepare[$i_event]') ") || die $dbh->errstr; } $id_event++;

The thing is now that one article contains multiple events. So I create a third table 'article_event_index' which looks like this :

$create_query = qq{ create table article_event_index( id_article int(10) NOT NULL, id_event int(10) NOT NULL, primary key (id_article, id_event), foreign key (id_article) references article (id_article), foreign key (id_event) references event (id_event) ) }; $dbh->do($create_query);

In the collection data part of my code I have all the references that I need :

#!/usr/bin/perl -w use strict; use locale; use warnings; #use diagnostics; use utf8; binmode(STDIN, "encoding(utf8)"); binmode(STDOUT, "encoding(utf8)"); binmode(STDERR, "encoding(utf8)"); #Directory with Unitex output files my @output_concord_files = glob("output_concord/*.txt"); #Using 'glob' implies random order of files => sort @output_concord_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map +{/output_concord\/concord\.(.*)\.txt/; [$1, $_]} @output_concord_file +s; my $index_file = "index.txt"; open (INDEX, '>:utf8', $index_file) || die "Couldn't open $index_f +ile : $!\n"; my $event; foreach my $output_concord_file(@output_concord_files){ open (my $fh, '<:utf8', $output_concord_file) || die "Couldn't + open $output_concord_file : $!\n"; while (<$fh>){ if ($_ =~ /=E-(.*)=event/){ $event = $1; print "$output_concord_file -> $event\n"; print INDEX "$output_concord_file -> $event\n"; } } }

The output would be : outputcondord.0.txt -> rockfall outputcondord.0.txt -> avalanche outputcondord.1.txt -> rockfall outputcondord.2.txt -> rockfall And so on...

Now, I don't know how to make the perl statement which will fill the 'article_event_index' table. I use a 'for' loop to populate the two others tables and I increment the id for each one of them. Is this good practice ? What is the "good practice" for this kind of operation ? I search for days on the web one example which will simulate what I want here, but I didn't found anything. I'm open also to the 'prepare - execute' DBI method in contrast to 'do', it's the same thing for me because the computing time is not that important in this task. Hope that I was clear enough. Thank you Monks !!!

Replies are listed 'Best First'.
Re: Relational table with perl DBI
by Neighbour (Friar) on Mar 13, 2013 at 08:49 UTC
    If you're using numeric ID's, why not use the database's autonumber feature?
    You can query the autonumber ID after performing the insert using SELECT LAST_INSERT_ID() (assuming MySQL)

    What I'm missing in your question is how the events relate to the articles (which is quite important in determining how to fill the article_event_index-table).
    Where does the content of variables @url_prepare, @html_ pages_files_extended, @output_concord_files_prepare, @sys_time_prepare come from?

    You will probably need to insert everything in one go:
    Loop through files-to-be-processed { insert article fetch article_ID loop through events { insert event fetch event_ID insert (article_ID, event_ID) } }
    Also, to avoid sql injection and use good practice, try this to insert:
    my $sth_insert_article = $dbh->prepare( qq(INSERT INTO article (url, h +tml_extr_text, concord_file, sys_time) VALUES (?, ?, ?, ?))) or die " +Unable to prepare insert statement: " . $dbh->errstr; foreach my $article_index (0 .. @output_concord_files_prepare) { my $records_inserted = $sth_insert_article->execute($url_prepare[$ +article_index], $html_pages_files_extended[$article_index], $output_c +oncord_files_prepare[$article_index], $sys_time_prepare[$article_inde +x]); if ($records_inserted != 1) { die "Error inserting records, only [$records_inserted] got ins +erted: " . $sth->insert_article->errstr; } }

      You are right, all the arrays are inserted in one go. It looks like this :

      sub fill_Tables{ #Connecting to the database ############################################################# $dsn = "DBI:mysql:database=$database;host=$hostname"; $dbh = DBI->connect($dsn, $login, $password) || die "Couldn't conn +ect to $database : $!\n"; #Filling tables ############################################################# #Filling table 'article' ############################################################# #Gathering data from daily output for insertion ############################################################# #Directory with Unitex output files my @output_concord_files = glob("output_concord/*.txt"); #Using 'glob' implies random order of files => sort @output_concord_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map +{/output_concord\/concord\.(.*)\.txt/; [$1, $_]} @output_concord_file +s; #Declaring variable for data extraction my ($output_concord_file, $url, $sys_time, $event); my (@output_concord_files_prepare, @url_prepare, @sys_time_prepare +, @event_prepare, @index); #Opening, reading, and extracting column content of each concord.n +.txt file foreach $output_concord_file(@output_concord_files){ #Note : for the concord file, no processing implied -> stored +by default in $output_concord_file open (my $fh, '<:utf8', "$output_concord_file") || die "Couldn +'t open $output_concord_file : $!\n"; #Populating @output_concord_files_prepare array for column 'co +ncord_file' insertion push @output_concord_files_prepare, $output_concord_file; while (<$fh>){ if ($_ =~ /=\[=(.*)=\]=/){ $url = $1; #Populating @url_prepare array for column 'url' from +'article' table insertion push @url_prepare, $url; } if ($_ =~ /=\[\+(.*)\+\]=/){ $sys_time = $1; #Populating @sys_time_prepare array for column 'sys_ti +me' from 'article' table insertion push @sys_time_prepare, $sys_time; } if ($_ =~ /=E-(.*)=event/){ $event = $1; #Populating @event_duplicates array for column 'event' + from 'event' table insertion push @event_prepare, $event; #print "$output_concord_file -> $event\n"; push @index, $output_concord_file, $event; } } } #Input files for extraction of column : html_extr_text my $dir_html_pages = 'html_pages'; #Opening directory for readdir method, extracting only '.txt' file +s -> cohabitation with '.html' files opendir (DIRHTMLPAGES, $dir_html_pages) || die "Coudln't open $dir +_html_pages : $!"; #Extracting only the '.txt' format files my @html_pages_files = map{s/\.[^.]+$//;$_}grep{/\.txt$/} readdir +DIRHTMLPAGES; closedir (DIRHTMLPAGES); #Mapping and sorting the output following the readdir method patte +rn the "html.n.txt" file number, where n = 1 .. n @html_pages_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map{/ht +ml\.(.*)/; [$1, $_]} @html_pages_files; #Adding path 'html_pages/' and extension '.txt' of each file my @html_pages_files_extended; my $path = "html_pages\/"; my $extension = "\.txt "; my ($html_page_file, $extended); foreach $html_page_file(@html_pages_files){ $extended = $path . $html_page_file . $extension; #Populating @html_pages_files_extended for column html_extr_te +xt insertion push @html_pages_files_extended, $extended; } #print "@output_concord_files_prepare\n"; #print "@url_prepare\n"; #print "@sys_time_prepare\n"; #print "@event_prepare\n"; #print "@html_pages_files_extended\n"; #Insertion of extracted and synchronized data in 'article' table o +f e_slide database my $i_article; my $id_article = 0; my @article_index; for ($i_article = 0; $i_article < @output_concord_files_prepare; $ +i_article++){ #$dbh->do(" # INSERT INTO `article`(`id_article`, `url`, `html_extr_tex +t`,`concord_file`, `sys_time`) # VALUES ('$id_article', '$url_prepare[$i_article]', '$html +_pages_files_extended[$i_article]', '$output_concord_files_prepare[$i +_article]', '$sys_time_prepare[$i_article]') #") || die $dbh->errstr; push @article_index, $i_article; } #print "@article_event_index\n"; #print "@article_index\n"; $id_article++; #Insertion of extracted and synchonized data in 'event' table of e +_slide database my $i_event; my $id_event = 0; my @event_index; for ($i_event = 0; $i_event < @event_prepare; $i_event++){ #$dbh->do(" # INSERT INTO `event`(`id_event`, `event`) # VALUES ('$id_event', '$event_prepare[$i_event]') #") || die $dbh->errstr; push @event_index, $i_event; } #print "@event_index\n"; $id_event++; my $index_concord; my @index_concord; foreach my $index(@index){ if ($index =~ /output_concord\/concord\.(.*)\.txt(.*)/){ $index_concord = $1; push @index_concord, $index_concord; } } #print "@index_concord\n"; #Insertion of extracted and sychronized indexes in 'article_event_ +index' my $i_article_event_index; for ($i_article_event_index; $i_article_event_index < @event_index +; $i_article_event_index++){ $dbh->do(" INSERT INTO `article_event_index`(`id_article`, `id_event` +) VALUES ('$index_concord[$i_article_event_index]', '$event_ +index[$i_article_event_index]') ") || die $dbh->errstr; } }

      As you can see it's pretty complicated to get the neccessary data for each array that I'm going to use to populate each column of each table.

      The last table that I fill is the index one, and I get and error like this : "DBD::mysql::db do failed: Cannot add or update a child row: a foreign key constraint fails (`e_slide`.`article_event_index`, CONSTRAINT `article_event_index_ibfk_1` FOREIGN KEY (`id_article`) REFERENCES `article` (`id_article`)) at db2.pl line 262. Cannot add or update a child row: a foreign key constraint fails (`e_slide`.`article_event_index`, CONSTRAINT `article_event_index_ibfk_1` FOREIGN KEY (`id_article`) REFERENCES `article` (`id_article`)) at db2.pl line 262."

      By using "LAST_ID" and the autoincrement option of MySQL will I get all the id of the two tables or only the last one of each ?

      Thanks again!

        I tried using 'last_insert_id' and I get the id from my tables. The code looks like this :

        #Insertion of extracted and synchronized data in 'article' table of e_ +slide database my $i_article; my $id_article; for ($i_article = 0; $i_article < @output_concord_files_prepare; $ +i_article++){ $dbh->do(" INSERT INTO `article`(`url`, `html_extr_text`,`concord_fil +e`, `sys_time`) VALUES ('$url_prepare[$i_article]', '$html_pages_files_ext +ended[$i_article]', '$output_concord_files_prepare[$i_article]', '$sy +s_time_prepare[$i_article]') ") || die $dbh->errstr; $id_article = $dbh->last_insert_id(undef, undef, 'article', 'i +d_article'); } #Insertion of extracted and synchonized data in 'event' table of e +_slide database my $i_event; my $id_event; for ($i_event = 0; $i_event < @event_prepare; $i_event++){ $dbh->do(" INSERT INTO `event`(`event`) VALUES ('$event_prepare[$i_event]') ") || die $dbh->errstr; $id_event = $dbh->last_insert_id(undef, undef, 'event', 'id_ev +ent'); }

        So now, how do I get the one-to-many relationship in the third table? Because in this case one article contains multiple events. So the it would look like : 1 - 1, 1 - 2, 2 - 3, 2 - 4, 2 - 5 and so on.

        I also manage to have a "good practice" code for the insertion using the code that you provided :

        my @fields = (qw(url html_extr_text concord_file sys_time)); my $fieldlist = join ", ", @fields; my $field_placeholders = join ", ", map {'?'} @fields; my $insert_query = qq{ INSERT INTO article($fieldlist) VALUES ($field_placeholders) }; my $sth = $dbh->prepare($insert_query); foreach my $article_index (0 .. @output_concord_files_prepare){ $field_placeholders = $sth->execute($url_prepare[$article_inde +x], $html_pages_files_extended[$article_index], $output_concord_files +_prepare[$article_index], $sys_time_prepare[$article_index]); if ($field_placeholders != 1){ die "Error inserting records, only [$field_placeholders] g +ot inserted: " . $sth->insert->errstr; } }

        But still I don't know how to make the one-to-manu relationship in perl.

        It looks to me like this code has the potential to give you the same event more than once in the event table each with a different id.
        if ($_ =~ /=E-(.*)=event/){ $event = $1; push @event_prepare, $event; }
        It would be much simpler to forget using numerical keys and just use the $event itself as the primary key. Use a hash to eliminate duplicates like this ;
        if ($_ =~ /=E-(.*)=event/){ $event_prepare{$1} = 1; }
        With regard to the article table, I would use the n value from the filename outputcondord.n.txt as the primary key thus avoiding sorting and synchronising problems as well as making the data in the table more human readable. Your article_event_index would then just need to contain the n value from the filename and the text from the events in that file.

        poj