in reply to Nested loops?

Alright, let's see if I can come up with a better explanation... The working code was written by someone else, so there are a couple of things (a lot of things) I don't quite understand... Like in the foreach statement that defines my $sql1 (@{$sql1}). earlier in the script, $sql1 is defined as my $sql1 = $lib_dbh->selectall_arrayref($pull1). $pull1 is the select statement for SQLite. The second select statement ($sql2) uses two other tables to match elements and create a list of "target" sequences.

The data stored in the first $sql1 ($lib_dbh->selectall_arrayref($pull1)) is as such:

table 1

55436, atcgtggtcgtgt
56875, agtcgtagtctaa
56789, tgatgcgtctatc
23698, atcgtgctcgtgt
75699, tgatgcttctatc
87226, atcgtgatcgtgt
12214, agtcgttgtctaa
etc.

The data in the second table would be the same, except with only the filtered target sequences.

table2

55436, atcgtggtcgtgt
56875, agtcgtagtctaa
56789, tgatgcgtctatc
etc.

The foreach loops containing "$table1{$sql1->[1]}{$sql1->[0]}=undef;" then rearranges the tables to have the sequence first, and the id's second. (I don't know why, but that's the way it is set up. I have to work within the constraints of the original programmer so as not to break any of the follow on scripts.)

my $pull1 = "SELECT id, seq from table"; my $pull2 = "SELECT id, seq from table where table.id in(long string o +f nested "in" criteria); my $sql1 = $lib_dbh->selectall_arrayref($pull1); my $sql2 = $lib_dbh->selectall_arrayref($pull2); foreach my $sql1 (@{$sql1}) { $table1{$sql1->[1]}{$sql1->[0]}=undef; } foreach my $sql2 (@{$sql2}) { $table2{$sql2->[1]}{$sql2->[0]}=undef; } my @bases = ('A','C','G','T'); Label: foreach my $x (keys %table1){ if (exists $table2 ({$x})) { my $found_alt = 0; my @storage_array = (); @{$storage_array[1]} = keys %{$table1{$x}}; foreach my $bases (@bases) { my $alt = $x; substr($alt, 6, 1) = $opt; next if ($alt eq $x); if (exists $table1{$alt}) { $found_alt = 1; push @{$storage_array[2]}, keys %{$table1{$alt}}; } } next Label unless ($found_alt ); #continues to follow on script.

The output ends up being an array (with much more columns from the rest of the script)containing id's in [1] and [2].

Using the example data, I would want it to start with the sequence "atcgtggtcgtgt" from table 2. (id 55436.) It would then look through table 1 to see if it exists. (It does, and always will since it is a subset.) It then takes the sequence from table 1 and augments it with the 4 bases (my @bases = ('A','C','G','T');) at the 7th position (substr($alt, 6, 1) = $opt;).

It skips the augmented sequence that matches the original sequence, and then iterates over table 1 for the remaining three sequences. (atcgtgAtcgtgt, atcgtgCtcgtgt, atcgtgTtcgtgt). (I capitalized them for emphasis only.) Each time it finds a match, it stores the id to [2] in the array. [1] holds the original id.

For the example dataset, the array output would be something like:
[0]1 [1]55436 [2] 23698, 87226
[0]2 [1]56875 [2] 12214
[0]3 [1]56789 [2] 75699

Replies are listed 'Best First'.
Re^2: Nested loops?
by poj (Abbot) on Aug 18, 2017 at 19:39 UTC

    To avoid reading the whole table into a hash, consider using substrings within the SQL to match the sequences. For example

    #!/usr/bin/perl use strict; use DBI; use Data::Dumper; my $n = 6; unlink 'mytestdb.sqlite' if -e 'mytestdb.sqlite'; my $dbh = DBI->connect("dbi:SQLite:dbname=mytestdb.sqlite","",""); test_setup(); my $sql2 = " SELECT id,seq,substr(seq,1,$n),substr(seq,-$n) FROM testtable WHERE id IN ('55436','56875','56789')"; my $ar = $dbh->selectall_arrayref($sql2); my $sql3 = " SELECT id FROM testtable WHERE substr(seq,1,$n) = ? AND substr(seq,-$n) = ? AND id != ?"; my $sth3 = $dbh->prepare($sql3); my @output = (); my $i=0; for my $rec (@$ar){ $sth3->execute($rec->[2],$rec->[3],$rec->[0]); my $others = join ',', map { $_->[0] } @{ $sth3->fetchall_arrayref() }; push @output,[++$i,$rec->[0],$others]; } print Dumper \@output; sub test_setup { $dbh->do('CREATE TABLE testtable (id,seq)'); my $sth = $dbh->prepare('INSERT INTO testtable VALUES (?,?)'); while (<DATA>){ chomp; my @f = split ", ",$_; $sth->execute(@f); } } __DATA__ 55436, atcgtggtcgtgt 56875, agtcgtagtctaa 56789, tgatgcgtctatc 23698, atcgtgctcgtgt 75699, tgatgcttctatc 87226, atcgtgatcgtgt 12214, agtcgttgtctaa
    poj
Re^2: Nested loops?
by shmem (Chancellor) on Aug 18, 2017 at 18:58 UTC

    Again, you post a snippet which doesn't compile, and which, if it would, doesn't help me to help you, since it depends on a datasource unavailable to me.

    The data initialization stuff isn't interesting, so you could just skip that, and provide a representative subset of the anonymous hashes $slq1 and $sql2 (since that is what is relevant here), at best in a format Data::Dumper or related modules provide. Then, the foreach loop labeled with Label isn't finsished, and there's no code which does the transformation of @storage_array into the desired output you post.

    So, again, I have to guess. Why do you provide the necessary information needed to help you just piecemeals? See I know what I mean. Why don't you?

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'