Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678)

james28909 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by dave_the_m (Monsignor) on Apr 14, 2015 at 09:41 UTC
This runs in 0.02s on my machine, compared with your original which runs in 0.3s. So 15x faster :-) `my($data, $n); my $bufsize = 4096; my $mask0 = "\xff\0" x ($bufsize / 2 + 1); my $mask1 = "\0\xff" x ($bufsize / 2 + 1); $data = "\0"; while (($n = read $file, $data, $bufsize, 1) != 0) { print $reversedFile substr(((substr($data,2) & $mask0) \| ($data & $mask1)), 0, $n) +; }` [download] Dave.	[reply] [d/l]
Re^2: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by ikegami (Patriarch) on Apr 14, 2015 at 12:50 UTC
~~Shouldn't that be `substr($data,1)` (skip the NUL)?~~	[reply] [d/l]
Re^3: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by dave_the_m (Monsignor) on Apr 14, 2015 at 13:03 UTC
I see you've already corrected your post, but for anyone else wondering: even bytes have to be shifted left one slot; odd ones right one slot; the difference between -1 and +1 is 2, not 1. Dave.	[reply]
Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by Eily (Monsignor) on Apr 14, 2015 at 07:54 UTC
Before v5.20 the copy on write feature of string was not activated by default. This means that you may win a little time by avoiding to copy the strings in multiple scalars: `print $output pack("v", unpack("n", scalar <$fh>));`	[reply] [d/l]
Re^2: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by ikegami (Patriarch) on Apr 14, 2015 at 12:48 UTC
What copying? No string is copied in that code. COW would not help.	[reply]
Re^3: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by Eily (Monsignor) on Apr 14, 2015 at 14:38 UTC
There is in the second method posted by james28909: `my $data = do { local $/ = undef; open (my $fh, "<", $input_file) or die "could not open $input_file +: $!"; binmode($fh); <$fh>; }; my $reversed_data = pack( "v", unpack( "n", $data ) );` [download] But since it's not copied from one variable to another, it probably makes little difference indeed, I didn't think it through.	[reply] [d/l]
Re^4: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by ikegami (Patriarch) on Apr 14, 2015 at 18:19 UTC
Re^5: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by ikegami (Patriarch) on Apr 14, 2015 at 18:42 UTC
Some notes below your chosen depth have not been shown here
Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by pme (Monsignor) on Apr 14, 2015 at 08:53 UTC
Write in C. ;)	[reply]
Re^2: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by afoken (Chancellor) on Apr 14, 2015 at 15:54 UTC
Write in C. ;) Has already been done, ages before this question was posted, because this problem is ages old. And dd is significantly faster (97 msec) even on my old server: $ time dd if=tmp/TestFile.bin conv=swab of=tmp/TestFile.out 32768+0 records in 32768+0 records out 16777216 bytes (17 MB) copied, 0.0970995 s, 173 MB/s real 0m0.130s user 0m0.020s sys 0m0.080s $ od -h tmp/TestFile.bin \| head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 0000020 0000 0000 0fac e0ff 0000 0000 dead beef 0000040 0000 0000 0000 0000 0000 0000 0000 7800 0000060 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 4946 4900 0000 0001 0000 0002 0000 0000 0001020 ffff ffff ffff ffff ffff ffff ffff ffff * 0002000 0000 0001 0000 000b 0000 0000 00ef fc00 0002020 0000 0000 0000 0400 0000 0000 0002 e800 $ od -h tmp/TestFile.out \| head 0000000 0000 0000 0000 0000 0000 0000 0000 0000 0000020 0000 0000 ac0f ffe0 0000 0000 adde efbe 0000040 0000 0000 0000 0000 0000 0000 0000 0078 0000060 0000 0000 0000 0000 0000 0000 0000 0000 * 0001000 4649 0049 0000 0100 0000 0200 0000 0000 0001020 ffff ffff ffff ffff ffff ffff ffff ffff * 0002000 0000 0100 0000 0b00 0000 0000 ef00 00fc 0002020 0000 0000 0000 0004 0000 0000 0200 00e8 $ [download] Just for fun, doing the same thing on a cdrom image file: $ time dd slackware-13.0-install-d2.iso conv=swab of=tmp/delete.me 1315400+0 records in 1315400+0 records out 673484800 bytes (673 MB) copied, 4.05028 s, 166 MB/s real 0m4.173s user 0m0.640s sys 0m3.370s $ [download] The speed of dd on my server is clearly limited by the disks, as working on a tmpfs in RAM runs at about double speed compared to running with files on disk: $ time dd if=/tmp/TestFile.bin conv=swab of=/tmp/TestFile.out 32768+0 records in 32768+0 records out 16777216 bytes (17 MB) copied, 0.0506392 s, 331 MB/s real 0m0.056s user 0m0.020s sys 0m0.030s $ [download] Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply] [d/l] [select]
Re^3: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by pme (Monsignor) on Apr 14, 2015 at 19:05 UTC
Hi afoken I can make dd even faster adding 'bs=20M' (default ibs=obs=512 in according to the man page).	[reply]
Re: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by FreeBeerReekingMonk (Deacon) on Apr 15, 2015 at 22:14 UTC
I am sure that dd is faster than XS, but here is an XS snippet (although I am not sure about the signed/unsigned or if I need to actually detect unicode bytes, assuming normal 8 bit bytes): `char * swapstring(str) SV str INIT: STRLEN len; char buf = SvPVbyte(str, len); CODE: while(len--){ asm("ror $4,%1" : "+r" (buf[len])); } RETVAL = buf; OUTPUT: RETVAL` [download] the ror idea comes from stackoverflow. edit: Doh! Nevermind, this produces 56781234 -> 12345678, which is not what was asked.	[reply] [d/l]
Re^2: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by oiskuu (Hermit) on Apr 16, 2015 at 09:19 UTC
Ror can do byteswaps all right, but it's not very remarkable at that. #! /usr/bin/perl use Inline C => Config => CC => 'gcc', OPTIMIZE => '-O3 -mssse3 -funro +ll-all-loops'; use Inline C => <<'__CUT__', NAME => 'swab'; #include <x86intrin.h> void swab_ror(SV v) { STRLEN slen; char s = SvPV(v, slen); uint16_t w = (uint16_t) s; size_t n = slen >> 1; for (; n; n--) { asm("rorw $8, %0" : "+r,m" (w[n-1]) : : "cc"); } } void swab_sse(SV v) { STRLEN slen; char s = SvPV(v, slen); __m128i x, t; size_t n = slen & ~(size_t)1; for (; (n & 0xe); n -= 2) { uint16_t w = (uint16_t) &s[n-2]; w = __rorw(w, 8); } t = _mm_set_epi8(14,15,12,13,10,11,8,9,6,7,4,5,2,3,0,1); for (; n; n -= 16) { x = _mm_lddqu_si128((__m128i)&s[n-16]); x = _mm_shuffle_epi8(x, t); _mm_storeu_si128((__m128i)&s[n-16], x); } } __CUT__ our $str = pack "C*", map rand(256), 1..34567; use Benchmark 'cmpthese'; cmpthese -5, { swab_ror => q( swab_ror $str ), swab_sse => q( swab_sse $str ), }; [download]	[reply] [d/l]
Re^3: Fastest byteswap (little endian to big endian (eg. 34127856 -> 12345678) by FreeBeerReekingMonk (Deacon) on Apr 16, 2015 at 18:24 UTC
that is a big performance difference! Thanks for showing that. `Rate swab_ror swab_sse swab_ror 155077/s -- -82% swab_sse 885905/s 471% --`	[reply]