Re: error joining hashref of an arrayref

Replies are listed 'Best First'.
Re^2: error joining hashref of an arrayref by ag4ve (Monk) on Oct 30, 2010 at 16:37 UTC
yes, i really do need to do something about this code one way or the other. it looks quite bad which means that i'm never going to want to maintain it. here's what i have for input `the name of something something else cool the title given does not exactly match` [download] and for output, i want something like this `something,something very nice,the name of something,something else coo +l` [download] where the cells are:`search word,column returned from db,line(s) of input the word came from` a search word may return more than one line in sql and that's fine, i'd like my csv to have the same date data for col 1, 3, and 4 and have the different result. i don't need duplicate data however, which was the point of the loops in question. below is my code. i've left in my original comments and added some: #!/usr/bin/perl ##### WHAT # # What names return per search ########## use strict; #use warnings; use DBI; my $searcher = "owner.manager, owner.owner, owner.manown"; my $dbh = DBI->connect('DBI:mysql:db;host=localhost', 'user', 'pass') or die "Database connection: $!"; open( FILE, "< $ARGV[0]" ); my %uword = (); my %seen = (); my $count = 0; my @data; my $key; my $i = 0; while ( <FILE> ) { my $line = $_; chomp ($line); my @word = split / /, $line; $count = 0; while ( $word[ $count ] ) { $word[ $count ] =~ tr/^[\-a-zA-Z]//; $word[ $count ] =~ s/\'/\\\'/g; $count++; } # deduplicate each word, but if it is a duplicate, # i still want to know what lines the word was on. foreach my $string ( @word ) { if ( $uword{ $string }[ 0 ] == 1 ) { push @{ $uword{ $string } }, $line; next; } $uword{ $string }[ 0 ] = 1; push @{ $uword{ $string } }, $line; } } # for every unique word, do a search... for my $key ( keys %uword ) { my ( $imo, $owner, $manown, $manager ); my $select = qq/SELECT $searcher /; my $from = qq/FROM owner, spd /; my $where = qq/WHERE MATCH( $searcher ) AGAINST('+ +$key' IN BOOLEAN MODE) /; my $join = qq/AND owner.number = spd.number/; my $query = $select . $from . $where . $join; print "SQL: $query\n"; my $sth = $dbh->prepare( $query ); $sth->execute; $sth->bind_columns( \$manager, \$owner, \$manown ); # since i don't know or care what field my matches came # from, take the results and put them into an array. while ( $sth->fetch ) { if ( defined( $owner ) ) { $data[$count] = $owner; $count++; } if ( defined( $manown ) ) { $data[$count] = $manown; $count++; } if ( defined( $manager ) ) { $data[$count] = $manager; $count++ } } # same general deduplication algorithm as before. # @data holds full names of data. foreach my $string ( @data ) { # dedupe data and sanity chec +k. next if !defined ($string); # should never be true. next if $seen{ $string }[ 0 ] == 1; # check %seen hash + / array for dupe $seen{ $string }[ 0 ] = 1; # define hash of array a +nd assign check to it. $seen{ $string }[ 1 ] = $key; # add word from line to +hash for reference. } } $dbh->disconnect; # this is here because i was up real late thinking # "wtf does what here".... # $key - (below) is the deduped sql string # $seen { $key }[ 1 ] - is the word searched in sql # $uword { '$seen{ $key }[ 1 ]' } - should be the line(s) of test the + search string came from for my $key ( keys %seen ) { print "$seen{ $key }[ 1 ], $key,"; print join(',', @{ $uword{ $seen{ $key }[ 1 ] }[ 1 .. $#{ @uwo +rd{ $seen{ $key } } } ] } ); print "\n"; } [download]	[reply] [d/l] [select]
Re^3: error joining hashref of an arrayref by afoken (Chancellor) on Oct 31, 2010 at 07:12 UTC
I see other problems in your code: You don't use placeholders in the SQL statement. You don't use `prepare_cached()`. You try to implement your own quoting. The quoting code can be circumvented (just make `$word[$count]` false, e.g. by inserting " `0`" into a line in the input file), which leads to a SQL injection vulnerability (all words following the 0 are not modified at all). Don't loop over the first few elements that are true, loop over all elements (`foreach (@word)` instead of `while ($word[$count])`). You don't reset `$count` to 0 before the `for` loop, it still contains the number of non-false "words" at the beginning of the last line read. So you start filling `@data` at some unknown offset, and all elements of `@data` up to that offset are `undef`. `next if !defined ($string); # should never be true.` [download] even shows that you are partly aware of that problem. Change `next` to `die "should not happen"` and see your script die. You mess with `$count` and arrays where a simple `push` would be sufficient. Don't make your life harder then necessary by trying to write C in Perl. Get rid of `$count` in the for loop, use push @data instead. See also Re^2: Massive Memory Leak and Re: Counting rows Sqlite for the first three points. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^4: error joining hashref of an arrayref by ag4ve (Monk) on Nov 01, 2010 at 04:37 UTC
your first three points on security are absolutely correct. however, after reading on prepare_cached(), i don't think that's my solution. maybe i'm wrong, but it just doesn't seem much more secure to me than prepare(). and i think that doing further checking on my input and a sanity check on the final query, that i be much better. in fact, since you brought it up, i found this great node here that has tons of pointers: http://www.perlmonks.org/?node_id=756058 and this discussion on sql injection http://unixwiz.net/techtips/sql-injection.html however, i think the first thing i'm going to be reading is the 'perlsec' doc mentioned at the top of that pm node. as for my code, i finally figured out how to make it more readable and usable. since i'm wanting items based on each word (which lines a word is in and which strings from sql a word returns) i use %data{ $word } and put all associated data on that. it's late, so this has issues that i need fixing, but my general idea is: while ( my $fields = $sth->fetchrow_arrayref ) { foreach my $field (@fields) { definition( $line, $word, $field ); } } } $dbh->disconnect; for my $word ( keys %data ) { while( my ($field, $type) = each %{ $data }{ $word } ) { print "$word,$field" if( $type eq 'field' ); while( my ($line, $type) = each %{ $data }{ $word } ) +{ print ",$line" if( $type eq 'line' ); } } } sub definition { my ($line, $word, $field) = @_; if( defined( $field ) { if( !defined( $data{ $word }{ $field } ) ) { $data{ $word }{ $field} = field; } $data{ $word }{ $line } = line; } } [download]	[reply] [d/l]
Re^3: error joining hashref of an arrayref by Marshall (Canon) on Nov 01, 2010 at 19:04 UTC
Still not sure what you are doing here. But I tried to make some simplifications in the parsing part. I didn't really look in any detail at the DB part. I think you just need a HoA (Hash of Array). Each word would have a list of line numbers that it appeared on. A "unique" word would have just one line number in its array. Past that I'm not sure what this app is trying to do. #!/usr/bin/perl use strict; use warnings; use DBI; my $searcher = "owner.manager, owner.owner, owner.manown"; my $dbh = DBI->connect('DBI:mysql:db;host=localhost', 'user', 'pass') or die "Database connection: $!"; open( FILE, '<', $ARGV[0] ) \|\| die "unable to open $ARGV[0]!"; my $line_num =0; my %words = (); #hash of array, list of line numbers per #each word while ( my $line = <FILE> ) { $line_num++; foreach my $word (split (/\s+/, $line)) { $word =~ tr/^[a-zA-Z-]//; $word =~ s\|(\'.*)\b\|\|g; #maybe ?? not sure: what's->what? push (@{$words{$word}}, $line_num); } } # for every word, do a search... # a "unique word" would be one who's array of line numbers has # exactly 1 number # I don't think that is what you wanted? # I think what was meant is do one search per each word? foreach my $word (%words) { # @{words{$word}} is array of line numbers that word appeared # upon # line number appears twice if word appeared twice on that # line ... do something in DB with $word.... } # $uword { '$seen{ $word }[ 1 ]' } - should be the line(s) of test th +e search string came from #no, that would be @{words{$word}} [download]	[reply] [d/l]