PetreAdi has asked for the wisdom of the Perl Monks concerning the following question:

I have a huge array (90 000 elements) and I want to create a lot of other arrays

Now for some quick example:

my @unique = (1, 20, 3, 4, 44, 55, 66, 77, 5, 10, 2, 11, 20, 42, 30, 3 +1, 32, 33, 34, 35, 36, 37, 40); my @in = (4, 3, 2, 2, 42, 40); my %first_index = map { $unique[$_] => $_ } reverse 0 .. @unique-1; my @idxs = map { $first_index{$_} // -1 } @in; @idxs = sort { $a <=> $b } @idxs;; $first = @idxs[0]; $cont = 1; for (1 .. $#idxs) { $diff = @idxs[$_] - $first ; if (($diff > 5) && ($diff < 10)) { @temp = @unique[$first+1..@idxs[$_]-1]; print join(", ", @temp); $diff = $diff-1; print " Hit $cont Length $diff \n"; $cont += 1; } $first = @idxs[$_]; }

Replies are listed 'Best First'.
Re: Better way to do this
by hdb (Monsignor) on Feb 20, 2016 at 18:05 UTC

    It would really be nice to explain what you want to achieve. Reverse engineering your example is tedious...

      I want to extract aubarrays between array indexes (@idxs)

      The subroutine takes an hour

      Any idea for optimizing it??

Re: Better way to do this
by BrowserUk (Patriarch) on Feb 20, 2016 at 19:12 UTC
    1. Add -w & use strict and fix the ~20 errors and warnings produced.

      It won't change anything, but people might take your code more seriously.

    2. Remove the reverse.

      It serves no purpose beyond obfuscation.

    3. Then you could try explaining what the output means.

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Works fine for me but it is very slow

      I need a faster way

        I need a faster way

        Well. The easiest way to make code run faster, is to avoid doing things that don't need need to be done.

        Reverseing an element array doesn't take very long, but since it makes no difference to the program, why bother.

        I think that there are two or three things that could be changed to speed things up, one by a substantial amount; if I've correctly guessed the purpose of your code.

        But since you seem reluctant explain the logic of your code, I've no way to assess the possibilities one way or another.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Better way to do this
by Cristoforo (Curate) on Feb 21, 2016 at 00:58 UTC
    This script ran in less than 1 second. But it doesn't print out the subarrays which would make it faster. Still, I can't see why it is taking your script over an hour to run. Unless you are repeatedly constructing a 90_000 element array and the other arrays and hash.
    #!/usr/bin/perl use strict; use warnings; my @unique = map int rand(45000), 1 .. 90_000; my %first_index = map { $unique[$_] => $_ } reverse 0 .. $#unique; my @in = map $unique[rand @unique], 1 .. 10000; my @idxs = sort {$a <=> $b} map $first_index{$_} // (), @in; my $first = $idxs[0]; my @sub_arrays; my $time = time; for (1 .. $#idxs) { my $diff = $idxs[$_] - $first ; if (($diff > 5) && ($diff < 10)) { push @sub_arrays, [ @unique[$first+1..$idxs[$_]-1] ]; } $first = $idxs[$_]; } #use Data::Dumper; print Dumper \@sub_arrays; print "Number of sub arrays: ", scalar @sub_arrays, "\n"; print "Time = ", time - $time, "\n";
    This printed results for 1 run:
    Number of sub arrays: 1330 Time = 0
Re: Better way to do this
by BillKSmith (Monsignor) on Feb 22, 2016 at 20:38 UTC
    I have created a test case with 90000 elements which produces about 2000 "Hits". Without printing, your original code processes it in less than 1 second. I tried adding the substring recognition to Laurent's approach. I find it much easier to understand, but it is slightly slower than yours. Note: Your original code does not find any substrings longer than 8 (not the ten in your spec).
    Bill