Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Rosetta Code: Long List is Long (faster - vec)

by marioroy (Prior)
on Jan 09, 2023 at 10:31 UTC ( [id://11149448]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Rosetta Code: Long List is Long (faster - vec)
in thread Rosetta Code: Long List is Long

For fixed string length (I ran several times), clang++ computes ~ 0.3 seconds faster compared to g++. See also, the J script result with AVX2 enabled (total time 3.48103 secs).

# gcc 12.2.1 $ g++ -o llil2vec -std=c++11 -Wall -O3 -march=native -mtune=skylake -f +align-functions=32 -fno-semantic-interposition -mno-vzeroupper -mpref +er-vector-width=256 llil2vec.cpp $ ./llil2vec big1.txt big2.txt big3.txt >out.txt llil2vec (fixed string length=6) start get_properties CPU time : 1.89609 secs emplace set sort CPU time : 0.544972 secs write stdout CPU time : 0.842451 secs total CPU time : 3.28355 secs total wall clock time : 4 secs # clang version 15.0.4 $ clang++ -o llil2vec -std=c++11 -Wall -O3 -march=native -mtune=skylak +e -falign-functions=32 -fno-semantic-interposition -mno-vzeroupper -m +prefer-vector-width=256 llil2vec.cpp $ ./llil2vec big1.txt big2.txt big3.txt >out.txt llil2vec (fixed string length=6) start get_properties CPU time : 1.67073 secs emplace set sort CPU time : 0.474207 secs write stdout CPU time : 0.828457 secs total CPU time : 2.97343 secs total wall clock time : 3 secs

Replies are listed 'Best First'.
Re^4: Rosetta Code: Long List is Long (faster - vec)
by eyepopslikeamosquito (Archbishop) on Jan 09, 2023 at 11:55 UTC

    Thanks! Yes, I'm seeing clang++ slightly faster too, but only for the limited length fixed string case.

    Funny, I'd earlier tried clang++ for the non fixed string case, but it seemed to be very slightly slower than g++. That result, combined with google suggesting that g++ usually produces slightly faster executables caused me to give up on clang++.

    I also fiddled with some of the many compiler parameters but felt overwhelmed by the sheer number and complexity of them, so just stuck to the basic ones for now. The natural variations in timing of each run also make it hard to be certain.

    After googling, I was thinking of something like:

    #!/bin/sh # Clear the caches (this needs root permissions) sync; echo 3 > /proc/sys/vm/drop_caches # Use taskset for cpu affinity (but which cpu to choose?) taskset 0x00000002 ./llil2vec big1.txt big2.txt big3.txt >f.tmp sleep 5 taskset 0x00000002 ./llil2vec big1.txt big2.txt big3.txt >f.tmp sleep 5 taskset 0x00000002 ./llil2vec big1.txt big2.txt big3.txt >f.tmp
    but was too gutless, given it needs root permissions and I didn't really know what I was doing.

      Yes, I'm seeing clang++ slightly faster too, but only for the limited length fixed string case.

      Same here.

      I also fiddled with some of the many compiler parameters but felt overwhelmed by the sheer number and complexity of them, so just stuck to the basic ones for now.

      I tried various parameters not realizing no improvements. The following performs similarly, without the extra CFLAGS.

      $ clang++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp $ ./llil2vec big1.txt big2.txt big3.txt >out.txt llil2vec (fixed string length=6) start get_properties CPU time : 1.65488 secs emplace set sort CPU time : 0.470765 secs write stdout CPU time : 0.850356 secs total CPU time : 2.97605 secs total wall clock time : 3 secs

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149448]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (10)
As of 2024-04-18 08:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found