Sort this data

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Sort this data (alternate method)
by extremely (Priest) on Nov 19, 2000 at 11:42 UTC

my @LoH;
{
  local $/="\n\n"; # may be "\r\n\r\n" under windows...
  while (<>) {
    my ($t, $a, $l) = split /\n/;
    push @LoH, { Title => $t, Author => $a, List => $l };
  }
}
[download]

I should probably have added this to the other post but I had just edited it like 5 times. =P

--
$you = new YOU;
honk() if $you->love(perl)

[reply]
[d/l]

Re: Sort this data
by extremely (Priest) on Nov 19, 2000 at 11:31 UTC

perlfunc:splice

tested now =)

@bigarray = ... ; #your data
my @LoH;
while (my ($t, $a, $l, $j)= splice (@bigarray,0,4)) {
   push @LoH, { Title => $t, Author => $a, Link => $l };
}
[download]

--
$you = new YOU;
honk() if $you->love(perl)

[reply]
[d/l]

Re: Re: Sort this data

by japhy (Canon) on Nov 19, 2000 at 19:35 UTC

splice()

while (my ($a,$b,$c) = splice(@data, -3)) {
  push @hashrefs, { a => $a, b => $b, c => $c };
  pop @data;  # null field
}
[download]

@hashrefs

reverse()

unshift()

splice()

my @hashrefs;
$#hashrefs = int(@data / 4);
my $i = $#hashrefs;
while (@data and my ($a,$b,$c) = splice(@data, -3)) {
  $hashrefs[$i--] = { a => $a, b => $b, c => $c };
  pop @data;
}
[download]

Update

jcwren

Update

splice()

japhy

Perl and Regex Hacker

[reply]
[d/l]
[select]

Re (tilly) 3: Sort this data

by tilly (Archbishop) on Nov 19, 2000 at 20:42 UTC

while (@big_array) {
  my $href;
  @$href{'title', 'author', 'link'} = map shift(@big_array), 1..4;
  push @structs, $href;
}
[download]

Also the cost of reverse is overstated. You have just walked through a list of n things in Perl. You then want to reverse a list of n/4 things. What is the relative cost of those two operations? Right.

Pick up good material on optimization. Such as this sample chapter from Code Complete. Or RE: Efficient Perl Programming. You will find that experienced people understand that getting maintainable code with good algorithms can result in better overall speed wins than trying to optimize every line.

Now noticing the splice, that matters. If it isn't optimized then that is an order(n) operation n times - which is n^2 and therefore is likely to be slow. But one reverse at the end is an order n operation once. Should the body of the loop be slightly more efficient from doing the slice rather than repeated manipulation of indices (something I would have to benchmark to have a feeling for either way) then your attempt to optimize would actually lose.

To summarize, don't worry about slow operations, worry about bad algorithms. A slow operation inside a loop may matter. A slow operation outside a loop which speeds up the loop can go either way. An order n (or worse) operation inside a loop - that is the only one which should cause you to want to care up front about optimizing the structure of the code!

EDIT
I had messed up the final paragraph.

[reply]
[d/l]

(jcwren) Re: (3) Sort this data

by jcwren (Prior) on Nov 19, 2000 at 19:55 UTC

$hashrefs

$i

@data

e-mail jcwren

[reply]
[d/l]
[select]

Re: (jcwren) Re: (3) Sort this data

by japhy (Canon) on Nov 19, 2000 at 20:04 UTC

Re: Re: Re: Sort this data

by extremely (Priest) on Nov 20, 2000 at 01:47 UTC

Also, what if fields are allowed to be null? If so, you HAVE to read from the front...

--
$you = new YOU;
honk() if $you->love(perl)

[reply]

Re: Re: Re: Re: Sort this data

by japhy (Canon) on Nov 20, 2000 at 01:52 UTC

Re: Sort this data
by japhy (Canon) on Nov 19, 2000 at 23:04 UTC

UPDATE

There was a huge error in this test (and I'm stupid for not using strict in it). I was testing @c where I should have been testing @a. I am now going to replace the bad results with the GOOD results.

in_fr_pu: slices the array in chunks of 4, push()es the hash ref to the new array
140.9 Hz @ 100, 13.9 Hz @ 1000, 1.3 Hz @ 10000
sh_fr_pu: shift()s the array, push()es the hash ref to the new array
156.7 Hz @ 100, 15.7 Hz @ 1000, 1.4 Hz @ 10000
sp_bk_in: presizes the new array, splice()s from the back, inserts the hash ref via index into the new array
149.4 Hz @ 100, 14.8 Hz @ 1000, 1.4 Hz @ 10000
sp_bk_rv: splice()s from the back, push()es the hash ref to the new array, the reverse()s
151.5 Hz @ 100, 14.7 Hz @ 1000, 1.3 Hz @ 10000
sp_bk_un: splice()s from the back, unshift()s the hash ref to the new array
151.6 Hz @ 100, 11.9 Hz @ 1000, .4 Hz @ 10000
sp_fr_pu: splice()s from the front, push()es the hash ref to the new array
160.0 Hz @ 100, 15.6 Hz @ 1000, 1.4 Hz @ 10000

sp_bk_un

unshift()

sp_fr_pu

splice()

sh_fr_pu

shift()

splice()

use Benchmark;

for $SIZE (100, 1000, 10000) {

timethese(-5, {
  sp_fr_pu => sub {
    my @a = (1..$SIZE);
    my @b;
    while (my @c = splice(@a, 0, 4)) { push @b, { @c } }
  },
  sp_bk_un => sub {
    my @a = (1..$SIZE);
    my @b;
    while (@a and my @c = splice(@a, -4)) { unshift @b, { @c } }
  },
  sp_bk_in => sub {
    my @a = (1..$SIZE);
    my @b;
    $#b = int(@a / 4);
    my $i = $#b;
    while (@a and my @c = splice(@a, -4)) { $b[$i--] = { @c } }
  },
  in_fr_pu => sub {
    my @a = (1..$SIZE);
    my @b;
    my $i = 0;
    while ($i < @a) { push @b, { @a[$i .. $i + 3] }; $i += 4; }
  },
  sh_fr_pu => sub {
    my @a = (1..$SIZE);
    my @b;
    while (@a) { push @b, { map shift(@a), 1..4 } }
  },
  sp_bk_rv => sub {
    my @a = (1..$SIZE);
    my @b;
    while (@a and my @c = splice(@a, -4)) { push @b, { @c } }
    @b = reverse @b;
  },
});

}
[download]

japhy

Perl and Regex Hacker

[reply]
[d/l]

Re: Re: Sort this data

by japhy (Canon) on Nov 19, 2000 at 23:59 UTC

UPDATE

These results were wrong. I'll get correct ones later.

Name	Hz @ 100	Hz @ 1000	Hz @ 2000
`in_fr_pu`	466.1	45.2	22.5
`sh_fr_pu`	509.7	49.5	24.6
`sp_bk_in`	???	???	???
`sp_bk_rv`	???	???	???
`sp_bk_un`	???	???	???
`sp_fr_pu`	489.3	47.2	23.4

japhy

Perl and Regex Hacker

[reply]

HUGE ERROR in results (Re: Re: Sort this data)

by japhy (Canon) on Nov 20, 2000 at 01:35 UTC

strict

japhy

Perl and Regex Hacker

[reply]

japhy looks at av.c (not av.h)
by japhy (Canon) on Nov 20, 2000 at 00:46 UTC

UPDATE

As per my revelations in sort this data, here is a bit of an adjusted report on the av.c source.

av.c

/* this is Perl_av_unshift()
   it unshifts 'num' undef values to an array */

/* determine how much non-used spaced is left
   that's been allocated for this array */
i = AvARRAY(av) - AvALLOC(av);

/* if there's room left... */
if (i) {
  /* if there's more room than we need, just use 'num' */
  if (i > num) i = num;

  /* this will set 'num' to 0 if we had enough room */
  /* 'num' is now how many new undef values we need added */
  num -= i;
  
  AvMAX(av) += i;  /* set the highest subscript??? */
  AvFILLp(av) += i;  /* add to highest subscript */
  SvPVX(av) = (char*)(AvARRAY(av) - i);  /* where Perl's array starts 
+*/
}

/* if there wasn't enough room already... */
if (num) {
  i = AvFILLp(av);  /* highest subscript */
  av_extend(av, i + num);  /* extend array to i+num elements */
  AvFILLp(av) += num;  /* add to highest subscript */
  ary = AvARRAY(av); /* get at the array */
  Move(ary, ary + num, i + 1, SV*);  /* slide elements up */
  do {
    ary[--num] = &PL_sv_undef;  /* set new elements to undef */
  } while (num);
}
[download]

unshift()

shift()

unshift()

*AvARRAY(av)

retval

*AvARRAY(av)

undef

AvARRAY(av) + 1

retval

<revelation>

</revelation>

japhy

Perl and Regex Hacker

[reply]
[d/l]

Re (tilly) 1: japhy looks at av.c (not av.h)

by tilly (Archbishop) on Nov 20, 2000 at 02:12 UTC

For those who are not following the code, the logic here is based on first trying to allocate elements which there is room for (this is the "if (i)" bit). If it cannot get them all in it then makes sure it has enough space, does some accounting, copies everything over, then inserts some new stuff. Note that japhy's comment about unused space is misleading, he means unused at the beginning of the array. There is also unused stuff at the other end, but we cannot directly use that.

For full details you have to also look at av.h. The call to AvMAX is just a macro to set xav_max which is the largest element which there is space for (adding to the front increases that), and AvFILL is setting xav_fill which is how many you have (adding obviously increases that as well).

The call to av_extend is where the array size increases. What it does is extends the definition of what the array is occupying. In fact it is in a buffer whose size is allocated in powers of two. If the extend takes the array beyond the size of the buffer, then a new buffer is allocated and the array is moved. If it does not then the array is left where it is.

Now it is clear how to make repeated calls to unshift fairly efficient. Right now it moves stuff by the minimum necessary. What we need is to have a new variable for how much to move it. That variable should be the maximum of num and the length of the array. This will cause space wastage, but it will also mean that when you hit an unshift and it has to move the array, it will not hit that copying logic again right away.

Even so building up an array using n calls to push will still be faster than unshift because there is less accounting to do. But both will be order n rather than having n calls to unshift being order n^2 as it is today.

[reply]

Re: japhy looks at av.c (not av.h)

by extremely (Priest) on Nov 20, 2000 at 02:13 UTC

$#=-500;

--
$you = new YOU;
honk() if $you->love(perl)

[reply]

Re: Sort this data
by princepawn (Parson) on Nov 20, 2000 at 04:48 UTC

Boulder

[reply]