in reply to Re^4: Risque Romantic Rosetta Roman Race
in thread Risque Romantic Rosetta Roman Race
... move the needle back towards 32?
Ah, I missed sharing that it no longer takes the full CPU (64-threads) to run as fast as C++. Below, I specify t1.txt four times to increase the compute time. It takes 17 physical CPU cores for Perl to run faster than C++ :).
Update: Using faster MCE variant. See tybalt89's enhancement.
$ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.356 secs roman_to_dec time : 0.460 secs output time : 0.124 secs total time : 0.941 secs real 0m0.947s user 0m0.875s sys 0m0.072s # https://perlmonks.org/?node_id=11152168 max_workers => 16 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.980 secs real 0m1.008s user 0m14.836s sys 0m0.075s # https://perlmonks.org/?node_id=11152168 max_workers => 17 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.912 secs real 0m0.940s user 0m14.802s sys 0m0.123s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.548 secs real 0m0.577s user 0m15.889s sys 0m0.231s $ cksum f.tmp p.tmp 737201628 75552000 f.tmp 737201628 75552000 p.tmp
I modified rtoa-pgatram-fixed.cpp and removed the last vector, cstart3, and cend3. Hence, write to standard output immediately. Perl now needs 4 more CPU cores to run faster. Crazy :)
// Convert roman to decimal cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { // std::cout << roman_to_dec(r) << '\n'; fast_io::io::println(roman_to_dec(r)); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; double ctaken = elaspe_time(cend2, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n";
$ time ./rtoa-pgatram-fixed2 t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.349 secs roman_to_dec time : 0.468 secs total time : 0.818 secs real 0m0.824s user 0m0.768s sys 0m0.056s # https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.770 secs real 0m0.799s user 0m15.147s sys 0m0.131s
The above results were captured on Fedora Linux 38. I also tried the Perl binary on Clear Linux for better performance :)
# https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.662 secs real 0m0.689s user 0m13.129s sys 0m0.132s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.475 secs real 0m0.502s user 0m13.732s sys 0m0.246s
About the Perl MCE demonstration. I made the demonstration simply for showcasing running parallel in Perl. It was a fun exercise for checking how many CPU cores does Perl need to reach C++ using fast_io.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Risque Romantic Rosetta Roman Race - OpenMP
by eyepopslikeamosquito (Archbishop) on May 14, 2023 at 04:17 UTC | |
Re^6: Risque Romantic Rosetta Roman Race - All in One
by eyepopslikeamosquito (Archbishop) on May 15, 2023 at 03:33 UTC | |
by marioroy (Prior) on May 15, 2023 at 12:23 UTC | |
by eyepopslikeamosquito (Archbishop) on May 16, 2023 at 00:36 UTC | |
by marioroy (Prior) on May 16, 2023 at 02:05 UTC | |
by marioroy (Prior) on May 16, 2023 at 13:39 UTC | |
|