Relational table with perl DBI

M15U has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!!! I have a question for you.

I'm using DBI module in perl and I have a problem that I cannot wrap my mind around it. Lets say I have this table called 'article', which I fill with some data collected from different files, the code looks like this :

my $id_article = 0;
    for ($i_article = 0; $i_article < @output_concord_files_prepare; $
+i_article++){
        $dbh->do("
            INSERT INTO `article`(`id_article`, `url`, `html_extr_text
+`,`concord_file`, `sys_time`)
            VALUES ('$id_article', '$url_prepare[$i_article]', '$html_
+pages_files_extended[$i_article]', '$output_concord_files_prepare[$i_
+article]', '$sys_time_prepare[$i_article]')
        ") || die $dbh->errstr;
    }
    $id_article++;
[download]

The code works. Each array contains strings of characters which are inserted in the 'article' table.

Now I have another table called 'event' :

my $id_event = 0;
    for ($i_event = 0; $i_event < @event_prepare; $i_event++){
        $dbh->do("
            INSERT INTO `event`(`id_event`, `event`)
            VALUES ('$id_event', '$event_prepare[$i_event]')
         ") || die $dbh->errstr;
    }
    $id_event++;
[download]

The thing is now that one article contains multiple events. So I create a third table 'article_event_index' which looks like this :

$create_query = qq{
        create table article_event_index(
            id_article int(10) NOT NULL,
            id_event int(10) NOT NULL,
            primary key (id_article, id_event),
            foreign key (id_article) references article (id_article),
            foreign key (id_event) references event (id_event)
        )
    };
    $dbh->do($create_query);
[download]

In the collection data part of my code I have all the references that I need :

    #!/usr/bin/perl -w
         
    use strict;
    use locale;
    use warnings;
    #use diagnostics;
    use utf8;

    binmode(STDIN, "encoding(utf8)");
    binmode(STDOUT, "encoding(utf8)");
    binmode(STDERR, "encoding(utf8)");

    #Directory with Unitex output files
    my @output_concord_files = glob("output_concord/*.txt");

    #Using 'glob' implies random order of files => sort 
    @output_concord_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map
+{/output_concord\/concord\.(.*)\.txt/; [$1, $_]} @output_concord_file
+s; 

    my $index_file = "index.txt";
    open (INDEX, '>:utf8', $index_file) || die "Couldn't open $index_f
+ile : $!\n";
    my $event;
    foreach my $output_concord_file(@output_concord_files){
        open (my $fh, '<:utf8', $output_concord_file) || die "Couldn't
+ open $output_concord_file : $!\n";
        while (<$fh>){
            if ($_ =~ /=E-(.*)=event/){
                $event = $1;
                print "$output_concord_file -> $event\n";
                print INDEX "$output_concord_file -> $event\n";
            }
        }
    }
[download]

The output would be : outputcondord.0.txt -> rockfall outputcondord.0.txt -> avalanche outputcondord.1.txt -> rockfall outputcondord.2.txt -> rockfall And so on...

Now, I don't know how to make the perl statement which will fill the 'article_event_index' table. I use a 'for' loop to populate the two others tables and I increment the id for each one of them. Is this good practice ? What is the "good practice" for this kind of operation ? I search for days on the web one example which will simulate what I want here, but I didn't found anything. I'm open also to the 'prepare - execute' DBI method in contrast to 'do', it's the same thing for me because the computing time is not that important in this task. Hope that I was clear enough. Thank you Monks !!!

Comment on Relational table with perl DBI Select or Download Code

Replies are listed 'Best First'.
Re: Relational table with perl DBI by Neighbour (Friar) on Mar 13, 2013 at 08:49 UTC
If you're using numeric ID's, why not use the database's autonumber feature? You can query the autonumber ID after performing the insert using `SELECT LAST_INSERT_ID()` (assuming MySQL) What I'm missing in your question is how the events relate to the articles (which is quite important in determining how to fill the article_event_index-table). Where does the content of variables @url_prepare, @html_ pages_files_extended, @output_concord_files_prepare, @sys_time_prepare come from? You will probably need to insert everything in one go: `Loop through files-to-be-processed { insert article fetch article_ID loop through events { insert event fetch event_ID insert (article_ID, event_ID) } }` [download] Also, to avoid sql injection and use good practice, try this to insert: my $sth_insert_article = $dbh->prepare( qq(INSERT INTO article (url, h +tml_extr_text, concord_file, sys_time) VALUES (?, ?, ?, ?))) or die " +Unable to prepare insert statement: " . $dbh->errstr; foreach my $article_index (0 .. @output_concord_files_prepare) { my $records_inserted = $sth_insert_article->execute($url_prepare[$ +article_index], $html_pages_files_extended[$article_index], $output_c +oncord_files_prepare[$article_index], $sys_time_prepare[$article_inde +x]); if ($records_inserted != 1) { die "Error inserting records, only [$records_inserted] got ins +erted: " . $sth->insert_article->errstr; } } [download]	[reply] [d/l] [select]
Re^2: Relational table with perl DBI by M15U (Acolyte) on Mar 13, 2013 at 09:01 UTC
You are right, all the arrays are inserted in one go. It looks like this : sub fill_Tables{ #Connecting to the database ############################################################# $dsn = "DBI:mysql:database=$database;host=$hostname"; $dbh = DBI->connect($dsn, $login, $password) \|\| die "Couldn't conn +ect to $database : $!\n"; #Filling tables ############################################################# #Filling table 'article' ############################################################# #Gathering data from daily output for insertion ############################################################# #Directory with Unitex output files my @output_concord_files = glob("output_concord/.txt"); #Using 'glob' implies random order of files => sort @output_concord_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map +{/output_concord\/concord\.(.)\.txt/; [$1, $_]} @output_concord_file +s; #Declaring variable for data extraction my ($output_concord_file, $url, $sys_time, $event); my (@output_concord_files_prepare, @url_prepare, @sys_time_prepare +, @event_prepare, @index); #Opening, reading, and extracting column content of each concord.n +.txt file foreach $output_concord_file(@output_concord_files){ #Note : for the concord file, no processing implied -> stored +by default in $output_concord_file open (my $fh, '<:utf8', "$output_concord_file") \|\| die "Couldn +'t open $output_concord_file : $!\n"; #Populating @output_concord_files_prepare array for column 'co +ncord_file' insertion push @output_concord_files_prepare, $output_concord_file; while (<$fh>){ if ($_ =~ /=\[=(.)=\]=/){ $url = $1; #Populating @url_prepare array for column 'url' from +'article' table insertion push @url_prepare, $url; } if ($_ =~ /=\[\+(.)\+\]=/){ $sys_time = $1; #Populating @sys_time_prepare array for column 'sys_ti +me' from 'article' table insertion push @sys_time_prepare, $sys_time; } if ($_ =~ /=E-(.)=event/){ $event = $1; #Populating @event_duplicates array for column 'event' + from 'event' table insertion push @event_prepare, $event; #print "$output_concord_file -> $event\n"; push @index, $output_concord_file, $event; } } } #Input files for extraction of column : html_extr_text my $dir_html_pages = 'html_pages'; #Opening directory for readdir method, extracting only '.txt' file +s -> cohabitation with '.html' files opendir (DIRHTMLPAGES, $dir_html_pages) \|\| die "Coudln't open $dir +_html_pages : $!"; #Extracting only the '.txt' format files my @html_pages_files = map{s/\.[^.]+$//;$_}grep{/\.txt$/} readdir +DIRHTMLPAGES; closedir (DIRHTMLPAGES); #Mapping and sorting the output following the readdir method patte +rn the "html.n.txt" file number, where n = 1 .. n @html_pages_files = map{$_->[1]} sort{$a->[0] <=> $b->[0]} map{/ht +ml\.(.)/; [$1, $_]} @html_pages_files; #Adding path 'html_pages/' and extension '.txt' of each file my @html_pages_files_extended; my $path = "html_pages\/"; my $extension = "\.txt "; my ($html_page_file, $extended); foreach $html_page_file(@html_pages_files){ $extended = $path . $html_page_file . $extension; #Populating @html_pages_files_extended for column html_extr_te +xt insertion push @html_pages_files_extended, $extended; } #print "@output_concord_files_prepare\n"; #print "@url_prepare\n"; #print "@sys_time_prepare\n"; #print "@event_prepare\n"; #print "@html_pages_files_extended\n"; #Insertion of extracted and synchronized data in 'article' table o +f e_slide database my $i_article; my $id_article = 0; my @article_index; for ($i_article = 0; $i_article < @output_concord_files_prepare; $ +i_article++){ #$dbh->do(" # INSERT INTO `article`(`id_article`, `url`, `html_extr_tex +t`,`concord_file`, `sys_time`) # VALUES ('$id_article', '$url_prepare[$i_article]', '$html +_pages_files_extended[$i_article]', '$output_concord_files_prepare[$i +_article]', '$sys_time_prepare[$i_article]') #") \|\| die $dbh->errstr; push @article_index, $i_article; } #print "@article_event_index\n"; #print "@article_index\n"; $id_article++; #Insertion of extracted and synchonized data in 'event' table of e +_slide database my $i_event; my $id_event = 0; my @event_index; for ($i_event = 0; $i_event < @event_prepare; $i_event++){ #$dbh->do(" # INSERT INTO `event`(`id_event`, `event`) # VALUES ('$id_event', '$event_prepare[$i_event]') #") \|\| die $dbh->errstr; push @event_index, $i_event; } #print "@event_index\n"; $id_event++; my $index_concord; my @index_concord; foreach my $index(@index){ if ($index =~ /output_concord\/concord\.(.)\.txt(.)/){ $index_concord = $1; push @index_concord, $index_concord; } } #print "@index_concord\n"; #Insertion of extracted and sychronized indexes in 'article_event_ +index' my $i_article_event_index; for ($i_article_event_index; $i_article_event_index < @event_index +; $i_article_event_index++){ $dbh->do(" INSERT INTO `article_event_index`(`id_article`, `id_event` +) VALUES ('$index_concord[$i_article_event_index]', '$event_ +index[$i_article_event_index]') ") \|\| die $dbh->errstr; } } [download] As you can see it's pretty complicated to get the neccessary data for each array that I'm going to use to populate each column of each table. The last table that I fill is the index one, and I get and error like this : "DBD::mysql::db do failed: Cannot add or update a child row: a foreign key constraint fails (`e_slide`.`article_event_index`, CONSTRAINT `article_event_index_ibfk_1` FOREIGN KEY (`id_article`) REFERENCES `article` (`id_article`)) at db2.pl line 262. Cannot add or update a child row: a foreign key constraint fails (`e_slide`.`article_event_index`, CONSTRAINT `article_event_index_ibfk_1` FOREIGN KEY (`id_article`) REFERENCES `article` (`id_article`)) at db2.pl line 262." By using "LAST_ID" and the autoincrement option of MySQL will I get all the id of the two tables or only the last one of each ? Thanks again!	[reply] [d/l]
Re^3: Relational table with perl DBI by M15U (Acolyte) on Mar 13, 2013 at 10:13 UTC
I tried using 'last_insert_id' and I get the id from my tables. The code looks like this : #Insertion of extracted and synchronized data in 'article' table of e_ +slide database my $i_article; my $id_article; for ($i_article = 0; $i_article < @output_concord_files_prepare; $ +i_article++){ $dbh->do(" INSERT INTO `article`(`url`, `html_extr_text`,`concord_fil +e`, `sys_time`) VALUES ('$url_prepare[$i_article]', '$html_pages_files_ext +ended[$i_article]', '$output_concord_files_prepare[$i_article]', '$sy +s_time_prepare[$i_article]') ") \|\| die $dbh->errstr; $id_article = $dbh->last_insert_id(undef, undef, 'article', 'i +d_article'); } #Insertion of extracted and synchonized data in 'event' table of e +_slide database my $i_event; my $id_event; for ($i_event = 0; $i_event < @event_prepare; $i_event++){ $dbh->do(" INSERT INTO `event`(`event`) VALUES ('$event_prepare[$i_event]') ") \|\| die $dbh->errstr; $id_event = $dbh->last_insert_id(undef, undef, 'event', 'id_ev +ent'); } [download] So now, how do I get the one-to-many relationship in the third table? Because in this case one article contains multiple events. So the it would look like : 1 - 1, 1 - 2, 2 - 3, 2 - 4, 2 - 5 and so on. I also manage to have a "good practice" code for the insertion using the code that you provided : my @fields = (qw(url html_extr_text concord_file sys_time)); my $fieldlist = join ", ", @fields; my $field_placeholders = join ", ", map {'?'} @fields; my $insert_query = qq{ INSERT INTO article($fieldlist) VALUES ($field_placeholders) }; my $sth = $dbh->prepare($insert_query); foreach my $article_index (0 .. @output_concord_files_prepare){ $field_placeholders = $sth->execute($url_prepare[$article_inde +x], $html_pages_files_extended[$article_index], $output_concord_files +_prepare[$article_index], $sys_time_prepare[$article_index]); if ($field_placeholders != 1){ die "Error inserting records, only [$field_placeholders] g +ot inserted: " . $sth->insert->errstr; } } [download] But still I don't know how to make the one-to-manu relationship in perl.	[reply] [d/l] [select]
Re^4: Relational table with perl DBI by Neighbour (Friar) on Mar 13, 2013 at 12:30 UTC
Re^5: Relational table with perl DBI by M15U (Acolyte) on Mar 13, 2013 at 13:55 UTC
Some notes below your chosen depth have not been shown here
Re^3: Relational table with perl DBI by poj (Abbot) on Mar 14, 2013 at 10:43 UTC
It looks to me like this code has the potential to give you the same event more than once in the event table each with a different id. `if ($_ =~ /=E-(.)=event/){ $event = $1; push @event_prepare, $event; }` [download] It would be much simpler to forget using numerical keys and just use the $event itself as the primary key. Use a hash to eliminate duplicates like this ; `if ($_ =~ /=E-(.)=event/){ $event_prepare{$1} = 1; }` [download] With regard to the article table, I would use the n value from the filename outputcondord.n.txt as the primary key thus avoiding sorting and synchronising problems as well as making the data in the table more human readable. Your article_event_index would then just need to contain the n value from the filename and the text from the events in that file. poj	[reply] [d/l] [select]