brancusi has asked for the wisdom of the Perl Monks concerning the following question:

I have a string of co-ordinates that are command and space separated.

lon,lat,alt lon,lat,alt lon,lat,alt

What I want to end up with is an array of points where a point is [lon,lat] for passing into a specific perl module. I can of course do this with a slow loop as follows. I have left this relatively verbose to help explain what I'm doing

my @list = split(/[, ]/,$coords); my $size = @list; my @poly = (); for (my $i=0; $i<=($size-3); $i+=3) { my $lon = $list[$i+0]; my $lat = $list[$i+1]; my $point = [$lon,$lat]; push (@poly, $point); }

Now this works fine, but I have coordinate lists of tens of thousands of points and performance is an issue as this needs to run as a web service.

I have a feeling that using map and grep I could make this much more efficient, but I can't get my head round how.

Do any wise monks have a neat and fast solution for this problem?

Thanks!

Replies are listed 'Best First'.
Re: Fastest way to turn coordinate list into array
by kennethk (Abbot) on Apr 09, 2010 at 18:29 UTC
    I've got a couple of possibilities here for speeding up your code. Best performance thus far: a capturing regular expression:

    #!/usr/bin/perl use strict; use warnings; use Benchmark qw(cmpthese :hireswallclock); my $coords = 'lon,lat,alt ' x 20000; cmpthese(100, { OP => sub { my @list = split(/[, ]/,$coords); my $size = @list; my @poly = (); for (my $i=0; $i<=($size-3); $i+=3) { my $lon = $list[$i+0]; my $lat = $list[$i+1]; my $point = [$lon,$lat]; push (@poly, $point); } }, double_split => sub { my @list = split(' ',$coords); my @poly; $poly[@list-1] = 0; for (@list) { push (@poly, [ (split /,/, $_)[0, +1] ]); } }, while_shift => sub { my @list = split(/[, ]/,$coords); my @poly = (); while (@list) { push @poly, [shift @list, shift @l +ist]; shift @list; } }, regex => sub { my @poly = (); while ($coords =~ /([^,]+),([^,]+),[^\ +s]+ /g) { push @poly, [$1,$2]; } }, chromatic => sub { my @poly = map { [ (split /,/, $_)[ +0, 1] ] } split / /, $coords;}, jwkrahn => sub { my @poly = map [ ( split /,/ )[ 0, 1 +] ], split ' ', $coords;}, ike1 => sub { my @poly = map [ (split /,/, $_)[0, 1] ] +, split ' ', $coords;}, ike2 => sub { my @poly = map [ /([^,]*),([^,]*)/ ], sp +lit ' ', $coords; }, }); __END__ Rate OP while_shift double_split ike1 jwkrahn chromat +ic ike2 regex OP 6.14/s -- -17% -49% -50% -51% -5 +2% -54% -59% while_shift 7.42/s 21% -- -38% -40% -40% -4 +2% -45% -51% double_split 11.9/s 95% 61% -- -3% -4% - +7% -11% -21% ike1 12.3/s 101% 66% 3% -- -1% - +4% -8% -18% jwkrahn 12.4/s 103% 67% 4% 1% -- - +3% -8% -18% chromatic 12.9/s 109% 73% 8% 4% 3% +-- -4% -15% ike2 13.4/s 119% 81% 13% 9% 8% +5% -- -11% regex 15.1/s 147% 104% 27% 23% 22% 1 +8% 13% --

    Update: Following ikegami's post below, I reran with more tests, closing some apps I had open and changing the spec to match the OP (originally ran for x 1000, whereas spec says "tens of thousands") and got the same ordering.

    This is perl, v5.8.9 built for MSWin32-x86-multi-thread (with 9 registered patches, see perl -V for more detail)

      Those results are different than what I would expect.

      On perl, v5.10.0 built for i486-linux-gnu-thread-multi, I get what I'd expect:

      Rate OP while_shift regex ike2 chromatic + ike1 OP 245/s -- -17% -28% -38% -48% + -50% while_shift 297/s 21% -- -13% -26% -36% + -39% regex 340/s 39% 15% -- -15% -27% + -31% ike2 398/s 63% 34% 17% -- -15% + -19% chromatic 467/s 91% 57% 37% 17% -- + -5% ike1 490/s 100% 65% 44% 23% 5% + --

      On perl 5, version 12, subversion 0 (v5.12.0-RC1) built for i686-linux, I the same results.

      Rate OP while_shift regex ike2 chromatic + ike1 OP 117/s -- -22% -28% -42% -52% + -54% while_shift 151/s 29% -- -7% -25% -38% + -41% regex 163/s 39% 8% -- -19% -33% + -36% ike2 202/s 72% 33% 24% -- -18% + -21% chromatic 244/s 109% 62% 50% 21% -- + -4% ike1 254/s 117% 68% 56% 26% 4% + --

      The two extra tests are:

      ike1 => sub { my @poly = map [ (split /,/, $_)[0, 1] ], split ' ', $coords; }, ike2 => sub { my @poly = map [ /([^,]*),([^,]*)/ ], split ' ', $coords; },

        Brilliant, thank you all for your speedy replies. FWIW the regex was simply hanging apache with real data at my end (some lists have 300K points) but the chromatic and ikegami solutions are definitely helping.

        It has highlighted that actually the real problem is algorithmic rather than performance of this section of code as the speed up, whilst beneficial, isn't the main problem. I can think of a way of whittling down the data to test in another part of the code so I'll try that out next. Again, many thanks - and ikegami's code will remain in as the winner so far. Roger
        As I've pointed out in the update above, I reran and maintained ordering. I thought perhaps it was an 5.8/10 issue, but ran on my Linux box and got the following:

        Rate while_shift OP regex ike2 double_split chromatic + jwkrahn ike1 while_shift 15.0/s -- -11% -30% -40% -46% -51% + -54% -54% OP 16.8/s 12% -- -22% -33% -40% -45% + -48% -48% regex 21.5/s 43% 28% -- -14% -23% -29% + -34% -34% ike2 25.0/s 67% 49% 16% -- -10% -18% + -23% -23% double_split 27.9/s 86% 66% 30% 11% -- -8% + -14% -14% chromatic 30.4/s 102% 81% 41% 22% 9% -- + -6% -6% jwkrahn 32.4/s 116% 93% 50% 29% 16% 6% + -- -0% ike1 32.5/s 116% 93% 51% 30% 17% 7% + 0% -- This is perl, v5.8.8 built for x86_64-linux-gnu-thread-multi

        Anyone have an idea why my regex engine is so fast for ActiveState Perl 5.8.9?

Re: Fastest way to turn coordinate list into array
by chromatic (Archbishop) on Apr 09, 2010 at 18:22 UTC

    Untested, but does this do what you want?

    my @poly = map { [ (split /,/, $_)[0, 1] ] } split / /, $coords;
Re: Fastest way to turn coordinate list into array
by jwkrahn (Abbot) on Apr 09, 2010 at 18:27 UTC
    my @poly = map [ ( split /,/ )[ 0, 1 ] ], split ' ', $coords;
Re: Fastest way to turn coordinate list into array
by GrandFather (Saint) on Apr 09, 2010 at 23:14 UTC
    performance is an issue

    Really? So what where the results of your benchmark? How much faster does it need to be? Did you profile the code and determine that this area is the bottleneck?

    Although those questions look somewhat snide, in fact they are important. Unless you know what you are aiming for any why when trying to "make code efficient" you are most likely wasting time. If you need a 10 thousand times speed up then the place to look for a solution is likely to be vastly different than if you only need a 10 times speed up. But chances are there is a better solution (like cashing results in a database) that don't need this specific code to be any faster at all.

    Dramatic improvements in code execution time don't generally come from fiddling with implementation. They come from reconsidering the problem and developing or finding a better algorithm. Almost always that requires an understanding of the bigger picture.

    True laziness is hard work
      Dramatic improvements in code execution time don't generally come from fiddling with implementation.

      Oh yeah?

        "don't generally" ne "never"

        True laziness is hard work
Re: Fastest way to turn coordinate list into array
by JavaFan (Canon) on Apr 09, 2010 at 22:08 UTC
    Fastest way? Write it in C, and then use XS at the end to return a Perl datastructure to the program. You can speed up the rest of the program as well by not using Perl at all.

    But you have to ask yourself, is it really worth your time and the increased maintainance burden to have the "fastest way"?

      You can speed up the rest of the program as well by not using Perl at all.

      True, but given my environment is an optimised mod_perl back end with a reverse proxy front end, abandoning perl is not really the solution I was looking for ;-)