in reply to Re^3: Risque Romantic Rosetta Roman Race - Tried Array Lookup
in thread Risque Romantic Rosetta Roman Race

> It now takes the entire CPU (64 logical threads) for Perl MCE 1.0 to run faster. :)

Now that's a challenge! Can I can push the dial further? ... or will the ingenious tybalt89's unorthodox assistance from the side allow you to move the needle back towards 32? :)

Since I know how much you enjoyed my (anonymonk-provoked) MAX_STR_LEN_L hack in the long Long List is Long series, I've tried a similar stunt here in a desperate attempt to improve data locality and cache performance. I also added a (cheating) vector reserve and the total time at the end (thanks for pointing out this oversight).

Anyways, here are the timings of my latest version, rtoa-pgatram-fixed.cpp, using the fast_io library:

$ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.164 secs roman_to_dec time : 0.088 secs output time : 0.100 secs total time : 0.353 secs $ diff f.tmp roman.tmp $ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.167 secs roman_to_dec time : 0.086 secs output time : 0.098 secs total time : 0.353 secs $ diff f.tmp roman.tmp $ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.173 secs roman_to_dec time : 0.086 secs output time : 0.093 secs total time : 0.353 secs $ diff f.tmp roman.tmp

Update: with marioroy rtoa-pgatram-fixed2 below (without fast_io):

$ ./rtoa-pgatram-fixed2 t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.189 secs roman_to_dec time : 0.371 secs total time : 0.560 secs $ diff f.tmp roman.tmp

... with fast_io:

$ ./rtoa-pgatram-fixed2 t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.178 secs roman_to_dec time : 0.143 secs total time : 0.322 secs $ diff f.tmp roman.tmp

rtoa-pgatram-fixed.cpp

// rtoa-pgatram-fixed.cpp. Crude fixed length string version. // Compile with: // g++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 rtoa-pgatram-fixed +.cpp // or: // clang++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 rtoa-pgatram-f +ixed.cpp // or: // g++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 -I "$HOME/local-fa +st_io/fast_io/include" rtoa-pgatram-fixed.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <iostream> #include <string> #include <vector> #include <numeric> #include <chrono> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library // #include <fast_io.h> // ------------------------------------------------------------------- +--------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // ------------------------------------------------------------------- +--------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; using vec_str_type = std::vector<str_type>; using vec_int_type = std::vector<int>; // Read an input file of Roman Numerals and append them to a list // Return the number of Roman Numerals appended static int read_input_file( const char* fname, // in: file name containing a list of +Roman Numerals vec_str_type& vec_ret) // out: a vector of Roman Numeral strin +gs { FILE* fh; str_type line; int cnt = 0; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return 0; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline vec_ret.emplace_back(line); ++cnt; } ::fclose(fh); return cnt; } // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-fixed file...\n"; return 1; } // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstart1, cend1, cstart2, cend2, cstart3, cend3; // Read the input files into roman_list vec_str_type roman_list; roman_list.reserve(3999000); cstart1 = high_resolution_clock::now(); int cnt = 0; for (int i = 0; i < nfiles; ++i) { cnt += read_input_file( fname[i], roman_list ); } cend1 = high_resolution_clock::now(); double ctaken1 = elaspe_time(cend1, cstart1); std::cerr << "read_input_files : " << cnt << " items\n"; std::cerr << "read file time : " << std::setw(8) << ctaken1 << " + secs\n"; // Convert roman to decimal vec_int_type arabic_list; arabic_list.reserve(3999000); cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { arabic_list.emplace_back( roman_to_dec(r) ); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; // Output to stdout cstart3 = high_resolution_clock::now(); for ( auto const& i : arabic_list ) { std::cout << i << '\n'; // fast_io::io::println(i); } cend3 = high_resolution_clock::now(); double ctaken3 = elaspe_time(cend3, cstart3); std::cerr << "output time : " << std::setw(8) << ctaken3 << " + secs\n"; double ctaken = elaspe_time(cend3, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n"; return 0; }

Replies are listed 'Best First'.
Re^5: Risque Romantic Rosetta Roman Race - MCE Results on AMD Box
by marioroy (Prior) on May 13, 2023 at 10:32 UTC
    ... move the needle back towards 32?

    Ah, I missed sharing that it no longer takes the full CPU (64-threads) to run as fast as C++. Below, I specify t1.txt four times to increase the compute time. It takes 17 physical CPU cores for Perl to run faster than C++ :).

    Update: Using faster MCE variant. See tybalt89's enhancement.

    $ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.356 secs roman_to_dec time : 0.460 secs output time : 0.124 secs total time : 0.941 secs real 0m0.947s user 0m0.875s sys 0m0.072s # https://perlmonks.org/?node_id=11152168 max_workers => 16 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.980 secs real 0m1.008s user 0m14.836s sys 0m0.075s # https://perlmonks.org/?node_id=11152168 max_workers => 17 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.912 secs real 0m0.940s user 0m14.802s sys 0m0.123s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.548 secs real 0m0.577s user 0m15.889s sys 0m0.231s $ cksum f.tmp p.tmp 737201628 75552000 f.tmp 737201628 75552000 p.tmp

    I modified rtoa-pgatram-fixed.cpp and removed the last vector, cstart3, and cend3. Hence, write to standard output immediately. Perl now needs 4 more CPU cores to run faster. Crazy :)

    // Convert roman to decimal cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { // std::cout << roman_to_dec(r) << '\n'; fast_io::io::println(roman_to_dec(r)); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; double ctaken = elaspe_time(cend2, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n";
    $ time ./rtoa-pgatram-fixed2 t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.349 secs roman_to_dec time : 0.468 secs total time : 0.818 secs real 0m0.824s user 0m0.768s sys 0m0.056s # https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.770 secs real 0m0.799s user 0m15.147s sys 0m0.131s

    The above results were captured on Fedora Linux 38. I also tried the Perl binary on Clear Linux for better performance :)

    # https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.662 secs real 0m0.689s user 0m13.129s sys 0m0.132s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.475 secs real 0m0.502s user 0m13.732s sys 0m0.246s

    About the Perl MCE demonstration. I made the demonstration simply for showcasing running parallel in Perl. It was a fun exercise for checking how many CPU cores does Perl need to reach C++ using fast_io.

      For fun, I combined your change to eliminate the last vector with some old OpenMP code I'm sure you'll recognize. :)

      // rtoa-pgatram-openmp.cpp. Crude first attempt at an OpenMp version. // Compile with: // g++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 rtoa-pga +tram-openmp.cpp // or: // clang++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 rtoa +-pgatram-openmp.cpp // or: // g++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 -I "$HOM +E/local-fast_io/fast_io/include" rtoa-pgatram-openmp.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <string> #include <vector> #include <numeric> #ifdef _OPENMP #include <omp.h> #endif #include <chrono> #include <thread> #include <iostream> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> // ------------------------------------------------------------------- +--------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // ------------------------------------------------------------------- +--------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; using vec_str_type = std::vector<str_type>; using vec_int_type = std::vector<int>; // Read an input file of Roman Numerals and append them to a list static void read_input_file( const char* fname, // in: file name containing a list of +Roman Numerals vec_str_type& vec_ret) // out: a vector of Roman Numeral strin +gs { FILE* fh; str_type line; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline vec_ret.emplace_back(line); } ::fclose(fh); } // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-openmp file...\n"; return 1; } #ifdef _OPENMP std::cerr << "use OpenMP\n"; #else std::cerr << "don't use OpenMP\n"; #endif // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstart1, cend1, cstart2, cend2; #ifdef _OPENMP // Determine the number of threads. const char* env_nthrs = std::getenv("NUM_THREADS"); int nthrs = (env_nthrs && strlen(env_nthrs)) ? ::atoi(env_nthrs) : +std::thread::hardware_concurrency(); omp_set_dynamic(false); omp_set_num_threads(nthrs); #else int nthrs = 1; #endif // Read the input files into roman_list vec_str_type roman_list; roman_list.reserve(3999000 * 4); cstart1 = high_resolution_clock::now(); // Run parallel, depending on the number of threads if ( nthrs == 1 || nfiles == 1 ) { for (int i = 0; i < nfiles; ++i) read_input_file( fname[i], roman_list ); } #ifdef _OPENMP else { #pragma omp parallel for schedule(static, 1) for (int i = 0; i < nfiles; ++i) { vec_str_type locvec; read_input_file( fname[i], locvec ); #pragma omp critical { // Append local vector roman_list.insert( roman_list.end(), locvec.begin(), locve +c.end() ); } } } #endif cend1 = high_resolution_clock::now(); double ctaken1 = elaspe_time(cend1, cstart1); std::cerr << "read_input_files : " << roman_list.size() << " items +\n"; std::cerr << "read file time : " << std::setw(8) << ctaken1 << " + secs\n"; // Convert roman to decimal cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { // std::cout << roman_to_dec(r) << '\n'; fast_io::io::println( roman_to_dec(r) ); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; double ctaken = elaspe_time(cend2, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n"; return 0; }

      As expected, it's a little bit faster:

      $ time NUM_THREADS=1 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt + >s.tmp use OpenMP read_input_files : 15996000 items read file time : 0.700 secs roman_to_dec time : 0.556 secs total time : 1.256 secs real 0m1.278s user 0m0.928s sys 0m0.350s

      $ time NUM_THREADS=4 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt + >s.tmp use OpenMP read_input_files : 15996000 items read file time : 0.405 secs roman_to_dec time : 0.568 secs total time : 0.974 secs real 0m0.995s user 0m1.439s sys 0m0.539s $ cmp f.tmp s.tmp

      References Added Later

      I may have been overthinking this. :) Here's a simple all-in-one version with no interim storage in vectors at all.

      // rtoa-pgatram-allinone.cpp. Crude allinone version. // Compile with: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatram-al +linone.cpp // or: // clang++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatra +m-allinone.cpp // or: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 -I "$HOME/local +-fast_io/fast_io/include" rtoa-pgatram-allinone.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <string> // #include <vector> #include <numeric> #include <chrono> #include <thread> #include <iostream> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> // --------------------------------------------------------------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // --------------------------------------------------------------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; // using vec_str_type = std::vector<str_type>; // using vec_int_type = std::vector<int>; // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } // Read an input file of Roman Numerals and do it all static void do_it_all( const char* fname // in: file name containing a list of Roma +n Numerals ) { FILE* fh; str_type line; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline // std::cout << roman_to_dec(line) << '\n'; fast_io::io::println( roman_to_dec(line) ); } ::fclose(fh); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-allinone file...\n"; return 1; } // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstartall, cendall; cstartall = high_resolution_clock::now(); for (int i = 0; i < nfiles; ++i) do_it_all( fname[i] ); cendall = high_resolution_clock::now(); double ctakenall = elaspe_time(cendall, cstartall); std::cerr << "do_it_all time : " << std::setw(8) << ctakenall << + " secs\n"; return 0; }

      $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.034 secs real 0m1.049s user 0m0.988s sys 0m0.061s $ cmp f4.tmp fixed4.tmp $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.047 secs real 0m1.070s user 0m0.989s sys 0m0.081s $ cmp f4.tmp fixed4.tmp

      As you can see, this is twice as fast as rtoa-pgatram-fixed.

      $ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f4.tmp read_input_files : 15996000 items read file time : 0.759 secs roman_to_dec time : 0.367 secs output time : 1.032 secs total time : 2.160 secs real 0m2.179s user 0m1.908s sys 0m0.270s $ cmp f4.tmp fixed4.tmp

      Update: Oops, the above rtoa-pgatram-fixed timing figures were built without using fast_io. The timings with fastio on my machine are:

      read_input_files : 15996000 items read file time : 0.750 secs roman_to_dec time : 0.370 secs output time : 0.389 secs total time : 1.510 secs real 0m1.529s user 0m1.348s sys 0m0.181s
      ... not twice as fast, but it's faster when you don't store anything in a vector ... though rtoa-pgatram-openmp might be faster with many files ... so I probably need to find a way to make rtoa-pgatram-allinone concurrent somehow (e.g. via chunking).

      Will this all in one version rtoa-pgatram-allinone be deemed acceptable by marioroy?

        > be deemed acceptable by marioroy

        Yikes! I'm a rookie when it comes to C++ and simply here for the fun and learning.

        > Am I missing something?

        There's no reason to confirm to fixed length, IMO. I gutted the fixed-length code. It runs faster, completing in 0.490 seconds.

        C++ Results:

        # https://perlmonks.org/?node_id=11152156 $ ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt | cksum read_input_files : 15996000 items read file time : 0.356 secs roman_to_dec time : 0.460 secs output time : 0.124 secs total time : 0.941 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152177 $ NUM_THREADS=4 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt | ck +sum use OpenMP read_input_files : 15996000 items read file time : 0.159 secs roman_to_dec time : 0.469 secs total time : 0.628 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152182 $ ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.637 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152186 $ ./rtoa-pgatram-allinone2 t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.515 secs fast_io scan, line_get do_it_all time : 0.490 secs fast_io memory mapping 737201628 75552000

        Perl Results:

        # https://perlmonks.org/?node_id=11152168 max_workers => 26 $ perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.658 secs Perl on Fedora Linux 38 time 0.574 secs Perl on Clear Linux 737201628 75552000 # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.548 secs Perl on Fedora Linux 38 time 0.480 secs Perl on Clear Linux 737201628 75552000

        rtoa-pgatram-allinone2.cpp

        Updated on May 19, 2023