umeboshi has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am using PDL to do an element-wise string comparison (for huge strings). As has been suggested elsewhere I am doing the following:

use PDL; use PDL::Char; + $PDL::SHARE=$PDL::SHARE; # keep stray warning quiet my $source=PDL::Char->new("ATTCCGGG"); + for my $str ( "ATTGCGGG", "ATACCGGC") { + my $match=PDL::Char->new($str); + my @diff=($match!=$source)->list; + print "@diff\n"; + }

I would like @diff, however, to be a piddle itself, i.e. a binary vector. Later I would like to do some operations on this binary vector and being it a piddle would make things much easier.

How can this be done? Thanks, Hadassa

Replies are listed 'Best First'.
Re: PDL: string comparison to binary piddle
by wind (Priest) on Apr 27, 2011 at 07:58 UTC

    You already have the solution, you just need to save the diff to a variable instead of converting it immediately to an array.

    use PDL; use PDL::Char; $PDL::SHARE = $PDL::SHARE; # keep stray warning quiet use strict; use warnings; my $source = PDL::Char->new('ATTCCGGG'); for my $str (qw(ATTGCGGG ATACCGGC)) { my $match = PDL::Char->new($str); # Is a PDL::Char my $diff = $match != $source; # Show diff print join(' ', $diff->list), "\n"; }

    Also, what warnings is the following code supposed to keep quiet? It changes nothing on my system.

    $PDL::SHARE = $PDL::SHARE; # keep stray warning quiet

      Thanks a lot! I guess that means that I do not really understand how PDL::Char works. So it is possible use two chars of the sort '010' and '001' and just add them up, i.e. obtain '011'? I just did it and it does work even though these are "strings" and not "vectors". I learned something.

      Thanks again!

        Yes, PDL::Core (the base class of PDL::Char) overloads almost all, if not all, the basic perl operators so that operations between pdl objects will return a relevant pdl object. This includes operations like != and +.

        Just look at this clip of source for PDL::Core to see the complete list:

        use overload ( "+" => \&PDL::plus, # in1, in2 "*" => \&PDL::mult, # in1, in2 "-" => \&PDL::minus, # in1, in2, swap if true "/" => \&PDL::divide, # in1, in2, swap if true "+=" => sub { PDL::plus ($_[0], $_[1], $_[0], 0); $_[0] +; }, # in1, in2, out, swap if true "*=" => sub { PDL::mult ($_[0], $_[1], $_[0], 0); $_[0]; }, + # in1, in2, out, swap if true "-=" => sub { PDL::minus ($_[0], $_[1], $_[0], 0); $_[0] +; }, # in1, in2, out, swap if true "/=" => sub { PDL::divide ($_[0], $_[1], $_[0], 0); $_[0] +; }, # in1, in2, out, swap if true ">" => \&PDL::gt, # in1, in2, swap if true "<" => \&PDL::lt, # in1, in2, swap if true "<=" => \&PDL::le, # in1, in2, swap if true ">=" => \&PDL::ge, # in1, in2, swap if true "==" => \&PDL::eq, # in1, in2 "eq" => \&PDL::eq, # in1, in2 "!=" => \&PDL::ne, # in1, in2 "<<" => \&PDL::shiftleft, # in1, in2, swap if true ">>" => \&PDL::shiftright, # in1, in2, swap if true "|" => \&PDL::or2, # in1, in2 "&" => \&PDL::and2, # in1, in2 "^" => \&PDL::xor, # in1, in2 "<<=" => sub { PDL::shiftleft ($_[0], $_[1], $_[0], 0); $_[0 +]; }, # in1, in2, out, swap if true ">>=" => sub { PDL::shiftright($_[0], $_[1], $_[0], 0); $_[0 +]; }, # in1, in2, out, swap if true "|=" => sub { PDL::or2 ($_[0], $_[1], $_[0], 0); $_[0] +; }, # in1, in2, out, swap if true "&=" => sub { PDL::and2 ($_[0], $_[1], $_[0], 0); $_[0] +; }, # in1, in2, out, swap if true "^=" => sub { PDL::xor ($_[0], $_[1], $_[0], 0); $_[0 +]; }, # in1, in2, out, swap if true "**=" => sub { PDL::power ($_[0], $_[1], $_[0], 0); +$_[0]; }, # in1, in2, out, swap if true "%=" => sub { PDL::modulo ($_[0], $_[1], $_[0], 0); +$_[0]; }, # in1, in2, out, swap if true "sqrt" => sub { PDL::sqrt ($_[0]); }, "abs" => sub { PDL::abs ($_[0]); }, "sin" => sub { PDL::sin ($_[0]); }, "cos" => sub { PDL::cos ($_[0]); }, "!" => sub { PDL::not ($_[0]); }, "~" => sub { PDL::bitnot ($_[0]); }, "log" => sub { PDL::log ($_[0]); }, "exp" => sub { PDL::exp ($_[0]); }, "**" => \&PDL::power, # in1, in2, swap if true "atan2" => \&PDL::atan2, # in1, in2, swap if true "%" => \&PDL::modulo, # in1, in2, swap if true "<=>" => \&PDL::spaceship, # in1, in2, swap if true "=" => sub {$_[0]}, # Don't deep copy, just copy + reference ".=" => sub {my @args = reverse &PDL::Core::rswap; eval { PDL::Ops::assgn(@args); }; if ($@) { # Remove references to this (and deeper) # code before rebarfing: $@ =~ s/\s*at .* line \d+\s*\.\n*/./; $@ =~ s/PDL:\s+//g; $@ =~ s/\s*Caught at .* pkg .+\n+//; PDL::Core::barf("Problem with assignment: +$@"); } return $args[1]; }, 'x' => sub{my $foo = $_[0]->null(); PDL::Primitive::matmult(@_[0,1],$foo); $foo;}, 'bool' => sub { return 0 if $_[0]->isnull; croak("multielement piddle in conditional expression" +) unless $_[0]->nelem == 1; $_[0]->clump(-1)->at(0); }, "\"\"" => \&PDL::Core::string ); }
Re: PDL: string comparison to binary piddle
by Ea (Chaplain) on Apr 27, 2011 at 09:26 UTC

    Just curious, but why aren't you using BioPerl for this? Is PDL faster or easier in your case? It's certainly not my area, but I would have guessed that BioPerl was set up for this.

    perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?
      It's kind of embarrassing but I didn't manage to install BioPerl on my comp. Also, I need to do atypical things for which there isn't a pre-written BioPerl function anyway. Anyway, I hope PDL is not slower than BioPerl and yes eventually I shall switch to BioPerl.