in reply to Re^2: Risque Romantic Rosetta Roman Race
in thread Risque Romantic Rosetta Roman Race

Is this the tybalt89 optimization? ... or is there another optimization I missed?

Yes.

It takes 32 logical cores for your Perl/MCE version to catch up to my C++ version 1.0. Is that right?

That was done using 16 physical and 16 logical CPU cores via taskset -c 0-15,32-47. BTW, I captured the UNIX time to include any global cleanup. It now takes the entire CPU (64 logical threads) for Perl MCE 1.0 to run faster. :) The Perl time includes launching Perl, loading modules, spawning and reaping workers (~ 0.06 secs).

# captured UNIX time C++ 1.0 : 0.450s C++ fast_io : 0.291s Perl MCE 64 thds : 0.252s

I tried also, an ARRAY for indexed-based lookups. But, that runs slower. Edit: Tried unpack, tip by tybalt89. ARRAY lookup is now faster.

# HASH my %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 ); # ARRAY, characters M D C L X V I my @rtoa; @rtoa[qw( 77 68 67 76 88 86 73 )] = qw( 1000 500 100 50 10 5 + 1 ); Perl MCE 64 thds : 0.252s @rtoa{ split //, uc($_) }; Perl MCE 64 thds : 0.297s @rtoa[ map ord, split //, uc($_) ]; Perl MCE 64 thds : 0.192s @rtoa[ unpack 'c*', uc($_) ];

Replies are listed 'Best First'.
Re^4: Risque Romantic Rosetta Roman Race - MCE Array Reduce
by marioroy (Prior) on May 12, 2023 at 10:40 UTC

    The Perl MCE code can be made faster by applying CPU affinity and enabling slurp IO.

    Updated on May 12, 2023 with tip from tybalt89 using unpack.
    Updated on May 13, 2023: fixed typo.

    The UNIX real time includes ~ 0.06 seconds for launching Perl, loading modules, spawning and reaping workers.

    # captured UNIX time C++ 1.0 : 0.450s C++ fast_io : 0.291s Perl MCE 64 thds : 0.252s # after tweaks: CPU affinity and slurp IO Perl MCE 64 thds : 0.218s # more tweaks: using an ARRAY and unpack Perl MCE 64 thds : 0.192s $ time perl rtoa-pgatram-mce.pl t1.txt >mce.tmp rtoa pgatram start time 0.165 secs real 0m0.192s user 0m7.596s sys 0m0.420s
Re^4: Risque Romantic Rosetta Roman Race
by tybalt89 (Monsignor) on May 12, 2023 at 16:11 UTC

      Thanks, tybalt89. Yes, it runs faster :) completing in less than 0.2 seconds. I updated the MCE demonstration.

      # captured UNIX time C++ 1.0 : 0.450s C++ fast_io : 0.291s Perl MCE 64 thds : 0.252s # after tweaks: CPU affinity and slurp IO Perl MCE 64 thds : 0.218s # more tweaks: using an ARRAY and unpack Perl MCE 64 thds : 0.192s $ time perl rtoa-pgatram-mce.pl t1.txt >mce.tmp rtoa pgatram start time 0.165 secs real 0m0.192s user 0m7.596s sys 0m0.420s
Re^4: Risque Romantic Rosetta Roman Race
by eyepopslikeamosquito (Archbishop) on May 13, 2023 at 09:03 UTC

    > It now takes the entire CPU (64 logical threads) for Perl MCE 1.0 to run faster. :)

    Now that's a challenge! Can I can push the dial further? ... or will the ingenious tybalt89's unorthodox assistance from the side allow you to move the needle back towards 32? :)

    Since I know how much you enjoyed my (anonymonk-provoked) MAX_STR_LEN_L hack in the long Long List is Long series, I've tried a similar stunt here in a desperate attempt to improve data locality and cache performance. I also added a (cheating) vector reserve and the total time at the end (thanks for pointing out this oversight).

    Anyways, here are the timings of my latest version, rtoa-pgatram-fixed.cpp, using the fast_io library:

    $ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.164 secs roman_to_dec time : 0.088 secs output time : 0.100 secs total time : 0.353 secs $ diff f.tmp roman.tmp $ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.167 secs roman_to_dec time : 0.086 secs output time : 0.098 secs total time : 0.353 secs $ diff f.tmp roman.tmp $ ./rtoa-pgatram-fixed t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.173 secs roman_to_dec time : 0.086 secs output time : 0.093 secs total time : 0.353 secs $ diff f.tmp roman.tmp

    Update: with marioroy rtoa-pgatram-fixed2 below (without fast_io):

    $ ./rtoa-pgatram-fixed2 t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.189 secs roman_to_dec time : 0.371 secs total time : 0.560 secs $ diff f.tmp roman.tmp

    ... with fast_io:

    $ ./rtoa-pgatram-fixed2 t1.txt >f.tmp read_input_files : 3999000 items read file time : 0.178 secs roman_to_dec time : 0.143 secs total time : 0.322 secs $ diff f.tmp roman.tmp

    rtoa-pgatram-fixed.cpp

    // rtoa-pgatram-fixed.cpp. Crude fixed length string version. // Compile with: // g++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 rtoa-pgatram-fixed +.cpp // or: // clang++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 rtoa-pgatram-f +ixed.cpp // or: // g++ -o rtoa-pgatram-fixed -std=c++20 -Wall -O3 -I "$HOME/local-fa +st_io/fast_io/include" rtoa-pgatram-fixed.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <iostream> #include <string> #include <vector> #include <numeric> #include <chrono> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library // #include <fast_io.h> // ------------------------------------------------------------------- +--------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // ------------------------------------------------------------------- +--------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; using vec_str_type = std::vector<str_type>; using vec_int_type = std::vector<int>; // Read an input file of Roman Numerals and append them to a list // Return the number of Roman Numerals appended static int read_input_file( const char* fname, // in: file name containing a list of +Roman Numerals vec_str_type& vec_ret) // out: a vector of Roman Numeral strin +gs { FILE* fh; str_type line; int cnt = 0; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return 0; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline vec_ret.emplace_back(line); ++cnt; } ::fclose(fh); return cnt; } // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-fixed file...\n"; return 1; } // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstart1, cend1, cstart2, cend2, cstart3, cend3; // Read the input files into roman_list vec_str_type roman_list; roman_list.reserve(3999000); cstart1 = high_resolution_clock::now(); int cnt = 0; for (int i = 0; i < nfiles; ++i) { cnt += read_input_file( fname[i], roman_list ); } cend1 = high_resolution_clock::now(); double ctaken1 = elaspe_time(cend1, cstart1); std::cerr << "read_input_files : " << cnt << " items\n"; std::cerr << "read file time : " << std::setw(8) << ctaken1 << " + secs\n"; // Convert roman to decimal vec_int_type arabic_list; arabic_list.reserve(3999000); cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { arabic_list.emplace_back( roman_to_dec(r) ); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; // Output to stdout cstart3 = high_resolution_clock::now(); for ( auto const& i : arabic_list ) { std::cout << i << '\n'; // fast_io::io::println(i); } cend3 = high_resolution_clock::now(); double ctaken3 = elaspe_time(cend3, cstart3); std::cerr << "output time : " << std::setw(8) << ctaken3 << " + secs\n"; double ctaken = elaspe_time(cend3, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n"; return 0; }

      ... move the needle back towards 32?

      Ah, I missed sharing that it no longer takes the full CPU (64-threads) to run as fast as C++. Below, I specify t1.txt four times to increase the compute time. It takes 17 physical CPU cores for Perl to run faster than C++ :).

      Update: Using faster MCE variant. See tybalt89's enhancement.

      $ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.356 secs roman_to_dec time : 0.460 secs output time : 0.124 secs total time : 0.941 secs real 0m0.947s user 0m0.875s sys 0m0.072s # https://perlmonks.org/?node_id=11152168 max_workers => 16 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.980 secs real 0m1.008s user 0m14.836s sys 0m0.075s # https://perlmonks.org/?node_id=11152168 max_workers => 17 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.912 secs real 0m0.940s user 0m14.802s sys 0m0.123s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.548 secs real 0m0.577s user 0m15.889s sys 0m0.231s $ cksum f.tmp p.tmp 737201628 75552000 f.tmp 737201628 75552000 p.tmp

      I modified rtoa-pgatram-fixed.cpp and removed the last vector, cstart3, and cend3. Hence, write to standard output immediately. Perl now needs 4 more CPU cores to run faster. Crazy :)

      // Convert roman to decimal cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { // std::cout << roman_to_dec(r) << '\n'; fast_io::io::println(roman_to_dec(r)); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; double ctaken = elaspe_time(cend2, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n";
      $ time ./rtoa-pgatram-fixed2 t1.txt t1.txt t1.txt t1.txt >f.tmp read_input_files : 15996000 items read file time : 0.349 secs roman_to_dec time : 0.468 secs total time : 0.818 secs real 0m0.824s user 0m0.768s sys 0m0.056s # https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.770 secs real 0m0.799s user 0m15.147s sys 0m0.131s

      The above results were captured on Fedora Linux 38. I also tried the Perl binary on Clear Linux for better performance :)

      # https://perlmonks.org/?node_id=11152168 max_workers => 21 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.662 secs real 0m0.689s user 0m13.129s sys 0m0.132s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt >p.tmp rtoa pgatram start time 0.475 secs real 0m0.502s user 0m13.732s sys 0m0.246s

      About the Perl MCE demonstration. I made the demonstration simply for showcasing running parallel in Perl. It was a fun exercise for checking how many CPU cores does Perl need to reach C++ using fast_io.

        For fun, I combined your change to eliminate the last vector with some old OpenMP code I'm sure you'll recognize. :)

        // rtoa-pgatram-openmp.cpp. Crude first attempt at an OpenMp version. // Compile with: // g++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 rtoa-pga +tram-openmp.cpp // or: // clang++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 rtoa +-pgatram-openmp.cpp // or: // g++ -o rtoa-pgatram-openmp -std=c++20 -fopenmp -Wall -O3 -I "$HOM +E/local-fast_io/fast_io/include" rtoa-pgatram-openmp.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <string> #include <vector> #include <numeric> #ifdef _OPENMP #include <omp.h> #endif #include <chrono> #include <thread> #include <iostream> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> // ------------------------------------------------------------------- +--------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // ------------------------------------------------------------------- +--------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; using vec_str_type = std::vector<str_type>; using vec_int_type = std::vector<int>; // Read an input file of Roman Numerals and append them to a list static void read_input_file( const char* fname, // in: file name containing a list of +Roman Numerals vec_str_type& vec_ret) // out: a vector of Roman Numeral strin +gs { FILE* fh; str_type line; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline vec_ret.emplace_back(line); } ::fclose(fh); } // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-openmp file...\n"; return 1; } #ifdef _OPENMP std::cerr << "use OpenMP\n"; #else std::cerr << "don't use OpenMP\n"; #endif // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstart1, cend1, cstart2, cend2; #ifdef _OPENMP // Determine the number of threads. const char* env_nthrs = std::getenv("NUM_THREADS"); int nthrs = (env_nthrs && strlen(env_nthrs)) ? ::atoi(env_nthrs) : +std::thread::hardware_concurrency(); omp_set_dynamic(false); omp_set_num_threads(nthrs); #else int nthrs = 1; #endif // Read the input files into roman_list vec_str_type roman_list; roman_list.reserve(3999000 * 4); cstart1 = high_resolution_clock::now(); // Run parallel, depending on the number of threads if ( nthrs == 1 || nfiles == 1 ) { for (int i = 0; i < nfiles; ++i) read_input_file( fname[i], roman_list ); } #ifdef _OPENMP else { #pragma omp parallel for schedule(static, 1) for (int i = 0; i < nfiles; ++i) { vec_str_type locvec; read_input_file( fname[i], locvec ); #pragma omp critical { // Append local vector roman_list.insert( roman_list.end(), locvec.begin(), locve +c.end() ); } } } #endif cend1 = high_resolution_clock::now(); double ctaken1 = elaspe_time(cend1, cstart1); std::cerr << "read_input_files : " << roman_list.size() << " items +\n"; std::cerr << "read file time : " << std::setw(8) << ctaken1 << " + secs\n"; // Convert roman to decimal cstart2 = high_resolution_clock::now(); for ( auto const& r : roman_list ) { // std::cout << roman_to_dec(r) << '\n'; fast_io::io::println( roman_to_dec(r) ); } cend2 = high_resolution_clock::now(); double ctaken2 = elaspe_time(cend2, cstart2); std::cerr << "roman_to_dec time : " << std::setw(8) << ctaken2 << " + secs\n"; double ctaken = elaspe_time(cend2, cstart1); std::cerr << "total time : " << std::setw(8) << ctaken << +" secs\n"; return 0; }

        As expected, it's a little bit faster:

        $ time NUM_THREADS=1 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt + >s.tmp use OpenMP read_input_files : 15996000 items read file time : 0.700 secs roman_to_dec time : 0.556 secs total time : 1.256 secs real 0m1.278s user 0m0.928s sys 0m0.350s

        $ time NUM_THREADS=4 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt + >s.tmp use OpenMP read_input_files : 15996000 items read file time : 0.405 secs roman_to_dec time : 0.568 secs total time : 0.974 secs real 0m0.995s user 0m1.439s sys 0m0.539s $ cmp f.tmp s.tmp

        References Added Later

        I may have been overthinking this. :) Here's a simple all-in-one version with no interim storage in vectors at all.

        // rtoa-pgatram-allinone.cpp. Crude allinone version. // Compile with: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatram-al +linone.cpp // or: // clang++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatra +m-allinone.cpp // or: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 -I "$HOME/local +-fast_io/fast_io/include" rtoa-pgatram-allinone.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <string> // #include <vector> #include <numeric> #include <chrono> #include <thread> #include <iostream> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> // --------------------------------------------------------------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // --------------------------------------------------------------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; // using vec_str_type = std::vector<str_type>; // using vec_int_type = std::vector<int>; // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } // Read an input file of Roman Numerals and do it all static void do_it_all( const char* fname // in: file name containing a list of Roma +n Numerals ) { FILE* fh; str_type line; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline // std::cout << roman_to_dec(line) << '\n'; fast_io::io::println( roman_to_dec(line) ); } ::fclose(fh); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-allinone file...\n"; return 1; } // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstartall, cendall; cstartall = high_resolution_clock::now(); for (int i = 0; i < nfiles; ++i) do_it_all( fname[i] ); cendall = high_resolution_clock::now(); double ctakenall = elaspe_time(cendall, cstartall); std::cerr << "do_it_all time : " << std::setw(8) << ctakenall << + " secs\n"; return 0; }

        $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.034 secs real 0m1.049s user 0m0.988s sys 0m0.061s $ cmp f4.tmp fixed4.tmp $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.047 secs real 0m1.070s user 0m0.989s sys 0m0.081s $ cmp f4.tmp fixed4.tmp

        As you can see, this is twice as fast as rtoa-pgatram-fixed.

        $ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f4.tmp read_input_files : 15996000 items read file time : 0.759 secs roman_to_dec time : 0.367 secs output time : 1.032 secs total time : 2.160 secs real 0m2.179s user 0m1.908s sys 0m0.270s $ cmp f4.tmp fixed4.tmp

        Update: Oops, the above rtoa-pgatram-fixed timing figures were built without using fast_io. The timings with fastio on my machine are:

        read_input_files : 15996000 items read file time : 0.750 secs roman_to_dec time : 0.370 secs output time : 0.389 secs total time : 1.510 secs real 0m1.529s user 0m1.348s sys 0m0.181s
        ... not twice as fast, but it's faster when you don't store anything in a vector ... though rtoa-pgatram-openmp might be faster with many files ... so I probably need to find a way to make rtoa-pgatram-allinone concurrent somehow (e.g. via chunking).

        Will this all in one version rtoa-pgatram-allinone be deemed acceptable by marioroy?