james28909 has asked for the wisdom of the Perl Monks concerning the following question:

I have a few lines of code I use for byte reversal of some 16mb files. I have got it to convert this byte order in .6 - .8 secs. I was wondering if there is a more faster way than using these two methods I have below. Please feel free to crush my record of 0.68640184402465.

Here is a file to test with ofcourse: Test File

use strict; use warnings; use Time::HiRes qw( time ); my $start = time(); open (my $file, '<', $ARGV[0]) or die 'cannot open $file: $!'; binmode($file); open (my $reversedFile, '+>', "$ARGV[0].swap"); binmode($reversedFile); my($data, $n); while (($n = read $file, $data, 4096) != 0) { print $reversedFile pack("v*", unpack("n*", $data)); } my $end = time(); my $runtime = sprintf("%.16s", $end - $start); print $runtime;
And another method which is a little slower it seems:
use strict; use warnings; use Time::HiRes qw( time ); my $start = time(); my $input_file = $ARGV[0]; my $data = do { local $/ = undef; open (my $fh, "<", $input_file) or die "could not open $input_file +: $!"; binmode($fh); <$fh>; }; my $reversed_data = pack( "v*", unpack( "n*", $data ) ); open my $output, '>', "bkpps3.swap.bin"; binmode($output); print $output $reversed_data; my $end = time(); my $runtime = sprintf( "%.16s", $end - $start ); print $runtime;
Like i said, I really am not to sure how to make it faster than it is. Any input is appreciated :)

Replies are listed 'Best First'.
Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678)
by dave_the_m (Monsignor) on Apr 14, 2015 at 09:41 UTC
    This runs in 0.02s on my machine, compared with your original which runs in 0.3s. So 15x faster :-)

    my($data, $n); my $bufsize = 4096; my $mask0 = "\xff\0" x ($bufsize / 2 + 1); my $mask1 = "\0\xff" x ($bufsize / 2 + 1); $data = "\0"; while (($n = read $file, $data, $bufsize, 1) != 0) { print $reversedFile substr(((substr($data,2) & $mask0) | ($data & $mask1)), 0, $n) +; }

    Dave.

      Shouldn't that be substr($data,1) (skip the NUL)?
        I see you've already corrected your post, but for anyone else wondering: even bytes have to be shifted left one slot; odd ones right one slot; the difference between -1 and +1 is 2, not 1.

        Dave.

Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678)
by Eily (Monsignor) on Apr 14, 2015 at 07:54 UTC

    Before v5.20 the copy on write feature of string was not activated by default. This means that you may win a little time by avoiding to copy the strings in multiple scalars: print $output pack("v*", unpack("n*", scalar <$fh>));

      What copying? No string is copied in that code. COW would not help.

        There is in the second method posted by james28909:

        my $data = do { local $/ = undef; open (my $fh, "<", $input_file) or die "could not open $input_file +: $!"; binmode($fh); <$fh>; }; my $reversed_data = pack( "v*", unpack( "n*", $data ) );
        But since it's not copied from one variable to another, it probably makes little difference indeed, I didn't think it through.

Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678)
by pme (Monsignor) on Apr 14, 2015 at 08:53 UTC
    Write in C. ;)
      Write in C. ;)

      Has already been done, ages before this question was posted, because this problem is ages old. And dd is significantly faster (97 msec) even on my old server:

      $ time dd if=tmp/TestFile.bin conv=swab of=tmp/TestFile.out 32768+0 records in 32768+0 records out 16777216 bytes (17 MB) copied, 0.0970995 s, 173 MB/s real 0m0.130s user 0m0.020s sys 0m0.080s $ od -h tmp/TestFile.bin | head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 0000020 0000 0000 0fac e0ff 0000 0000 dead beef 0000040 0000 0000 0000 0000 0000 0000 0000 7800 0000060 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 4946 4900 0000 0001 0000 0002 0000 0000 0001020 ffff ffff ffff ffff ffff ffff ffff ffff * 0002000 0000 0001 0000 000b 0000 0000 00ef fc00 0002020 0000 0000 0000 0400 0000 0000 0002 e800 $ od -h tmp/TestFile.out | head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 0000020 0000 0000 ac0f ffe0 0000 0000 adde efbe 0000040 0000 0000 0000 0000 0000 0000 0000 0078 0000060 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 4649 0049 0000 0100 0000 0200 0000 0000 0001020 ffff ffff ffff ffff ffff ffff ffff ffff * 0002000 0000 0100 0000 0b00 0000 0000 ef00 00fc 0002020 0000 0000 0000 0004 0000 0000 0200 00e8 $

      Just for fun, doing the same thing on a cdrom image file:

      $ time dd slackware-13.0-install-d2.iso conv=swab of=tmp/delete.me 1315400+0 records in 1315400+0 records out 673484800 bytes (673 MB) copied, 4.05028 s, 166 MB/s real 0m4.173s user 0m0.640s sys 0m3.370s $

      The speed of dd on my server is clearly limited by the disks, as working on a tmpfs in RAM runs at about double speed compared to running with files on disk:

      $ time dd if=/tmp/TestFile.bin conv=swab of=/tmp/TestFile.out 32768+0 records in 32768+0 records out 16777216 bytes (17 MB) copied, 0.0506392 s, 331 MB/s real 0m0.056s user 0m0.020s sys 0m0.030s $

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Hi afoken

        I can make dd even faster adding 'bs=20M' (default ibs=obs=512 in according to the man page).

Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678)
by FreeBeerReekingMonk (Deacon) on Apr 15, 2015 at 22:14 UTC

    I am sure that dd is faster than XS, but here is an XS snippet (although I am not sure about the signed/unsigned or if I need to actually detect unicode bytes, assuming normal 8 bit bytes):

    char * swapstring(str) SV *str INIT: STRLEN len; char* buf = SvPVbyte(str, len); CODE: while(len--){ asm("ror $4,%1" : "+r" (buf[len])); } RETVAL = buf; OUTPUT: RETVAL
    the ror idea comes from stackoverflow.
    edit: Doh! Nevermind, this produces 56781234 -> 12345678, which is not what was asked.

      Ror can do byteswaps all right, but it's not very remarkable at that.

      #! /usr/bin/perl use Inline C => Config => CC => 'gcc', OPTIMIZE => '-O3 -mssse3 -funro +ll-all-loops'; use Inline C => <<'__CUT__', NAME => 'swab'; #include <x86intrin.h> void swab_ror(SV *v) { STRLEN slen; char *s = SvPV(v, slen); uint16_t *w = (uint16_t*) s; size_t n = slen >> 1; for (; n; n--) { asm("rorw $8, %0" : "+r,m" (w[n-1]) : : "cc"); } } void swab_sse(SV *v) { STRLEN slen; char *s = SvPV(v, slen); __m128i x, t; size_t n = slen & ~(size_t)1; for (; (n & 0xe); n -= 2) { uint16_t *w = (uint16_t*) &s[n-2]; *w = __rorw(*w, 8); } t = _mm_set_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); for (; n; n -= 16) { x = _mm_lddqu_si128((__m128i*)&s[n-16]); x = _mm_shuffle_epi8(x, t); _mm_storeu_si128((__m128i*)&s[n-16], x); } } __CUT__ our $str = pack "C*", map rand(256), 1..34567; use Benchmark 'cmpthese'; cmpthese -5, { swab_ror => q( swab_ror $str ), swab_sse => q( swab_sse $str ), };

        that is a big performance difference! Thanks for showing that.
        Rate swab_ror swab_sse
        swab_ror 155077/s -- -82%
        swab_sse 885905/s 471% --