in reply to Re^5: Risque Romantic Rosetta Roman Race - MCE Results on AMD Box
in thread Risque Romantic Rosetta Roman Race

I may have been overthinking this. :) Here's a simple all-in-one version with no interim storage in vectors at all.

// rtoa-pgatram-allinone.cpp. Crude allinone version. // Compile with: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatram-al +linone.cpp // or: // clang++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 rtoa-pgatra +m-allinone.cpp // or: // g++ -o rtoa-pgatram-allinone -std=c++20 -Wall -O3 -I "$HOME/local +-fast_io/fast_io/include" rtoa-pgatram-allinone.cpp // to use the locally installed fast_io header-only library #include <cctype> #include <cstring> #include <string> // #include <vector> #include <numeric> #include <chrono> #include <thread> #include <iostream> #include <iomanip> // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> // --------------------------------------------------------------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // --------------------------------------------------------------- // Longest roman numeral is MMMDCCCLXXXVIII (3888) of length 15 // XXX: I'm off by one somewhere because 3888 fails with // MAX_STR_LEN_L of 16 but works with 17 #define MAX_STR_LEN_L 17 // The basic idea is to keep this struct small and without pointers to // improve data locality/cache performance when traversing the vector struct str_type { char slen; char str[MAX_STR_LEN_L]; }; // using vec_str_type = std::vector<str_type>; // using vec_int_type = std::vector<int>; // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(const str_type& st) { return std::accumulate( st.str, st.str + st.slen, 0, accfn ); } // Read an input file of Roman Numerals and do it all static void do_it_all( const char* fname // in: file name containing a list of Roma +n Numerals ) { FILE* fh; str_type line; fh = ::fopen(fname, "r"); if ( fh == NULL ) { std::cerr << "Error opening '" << fname << "' : " << strerror(er +rno) << "\n"; return; } while ( ::fgets( line.str, MAX_STR_LEN_L, fh ) != NULL ) { line.slen = ::strlen(line.str) - 1; // -1 to strip trailing n +ewline // std::cout << roman_to_dec(line) << '\n'; fast_io::io::println( roman_to_dec(line) ); } ::fclose(fh); } int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: rtoa-pgatram-allinone file...\n"; return 1; } // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstartall, cendall; cstartall = high_resolution_clock::now(); for (int i = 0; i < nfiles; ++i) do_it_all( fname[i] ); cendall = high_resolution_clock::now(); double ctakenall = elaspe_time(cendall, cstartall); std::cerr << "do_it_all time : " << std::setw(8) << ctakenall << + " secs\n"; return 0; }

$ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.034 secs real 0m1.049s user 0m0.988s sys 0m0.061s $ cmp f4.tmp fixed4.tmp $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.047 secs real 0m1.070s user 0m0.989s sys 0m0.081s $ cmp f4.tmp fixed4.tmp

As you can see, this is twice as fast as rtoa-pgatram-fixed.

$ time ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt >f4.tmp read_input_files : 15996000 items read file time : 0.759 secs roman_to_dec time : 0.367 secs output time : 1.032 secs total time : 2.160 secs real 0m2.179s user 0m1.908s sys 0m0.270s $ cmp f4.tmp fixed4.tmp

Update: Oops, the above rtoa-pgatram-fixed timing figures were built without using fast_io. The timings with fastio on my machine are:

read_input_files : 15996000 items read file time : 0.750 secs roman_to_dec time : 0.370 secs output time : 0.389 secs total time : 1.510 secs real 0m1.529s user 0m1.348s sys 0m0.181s
... not twice as fast, but it's faster when you don't store anything in a vector ... though rtoa-pgatram-openmp might be faster with many files ... so I probably need to find a way to make rtoa-pgatram-allinone concurrent somehow (e.g. via chunking).

Will this all in one version rtoa-pgatram-allinone be deemed acceptable by marioroy?

Replies are listed 'Best First'.
Re^7: Risque Romantic Rosetta Roman Race - All in One
by marioroy (Prior) on May 15, 2023 at 12:23 UTC
    > be deemed acceptable by marioroy

    Yikes! I'm a rookie when it comes to C++ and simply here for the fun and learning.

    > Am I missing something?

    There's no reason to confirm to fixed length, IMO. I gutted the fixed-length code. It runs faster, completing in 0.490 seconds.

    C++ Results:

    # https://perlmonks.org/?node_id=11152156 $ ./rtoa-pgatram-fixed t1.txt t1.txt t1.txt t1.txt | cksum read_input_files : 15996000 items read file time : 0.356 secs roman_to_dec time : 0.460 secs output time : 0.124 secs total time : 0.941 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152177 $ NUM_THREADS=4 ./rtoa-pgatram-openmp t1.txt t1.txt t1.txt t1.txt | ck +sum use OpenMP read_input_files : 15996000 items read file time : 0.159 secs roman_to_dec time : 0.469 secs total time : 0.628 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152182 $ ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.637 secs 737201628 75552000 # https://perlmonks.org/?node_id=11152186 $ ./rtoa-pgatram-allinone2 t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.515 secs fast_io scan, line_get do_it_all time : 0.490 secs fast_io memory mapping 737201628 75552000

    Perl Results:

    # https://perlmonks.org/?node_id=11152168 max_workers => 26 $ perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.658 secs Perl on Fedora Linux 38 time 0.574 secs Perl on Clear Linux 737201628 75552000 # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.548 secs Perl on Fedora Linux 38 time 0.480 secs Perl on Clear Linux 737201628 75552000

    rtoa-pgatram-allinone2.cpp

    Updated on May 19, 2023

      Thanks so much for that! And for confirming it is not "cheating" in this performance race to remove all vectors from a solution and write to stdout directly.

      Sadly, the documentation for fast_io seems to be non-existent and I always struggle to figure out how to use it. Hope someone writes a nice fast_io user manual.

      I also get confused as to whether I've built with fast_io or not, so I've created a new version, rtoa-pgatram-allinone3.cpp below, based on your excellent rtoa-pgatram-allinone2.cpp, that prints whether it's using fastio or not (via a USE_FAST_IO_L define). When built to not use fastio this new version uses std C++11 I/O (which in my experience is significantly slower that std C I/O).

      Here are the timings on my laptop.

      $ time ./rtoa-pgatram-allinone3 t1.txt t1.txt t1.txt t1.txt >f4.tmp not using fast_io do_it_all time : 1.936 secs real 0m1.953s user 0m1.832s sys 0m0.121s $ cmp f4.tmp fixed4.tmp

      $ time ./rtoa-pgatram-allinone3 t1.txt t1.txt t1.txt t1.txt >f4.tmp using fast_io do_it_all time : 0.875 secs real 0m0.894s user 0m0.803s sys 0m0.091s $ cmp f4.tmp fixed4.tmp

      Original rtoa-pgatram-allinone from this node using C std I/O for reading files and fast_io for writing to stdout:

      $ time ./rtoa-pgatram-allinone t1.txt t1.txt t1.txt t1.txt >f4.tmp do_it_all time : 1.034 secs real 0m1.049s user 0m0.988s sys 0m0.061s

      // rtoa-pgatram-allinone3.cpp. Crude allinone version. // based on rtoa-pgatram-allinone2.cpp https://perlmonks.org/?node_i +d=11152186 // // Obtain the fast_io library (required dependency): // git clone --depth=1 https://github.com/cppfastio/fast_io // // Compile with g++ or clang++: // g++ -o rtoa-pgatram-allinone3 -std=c++20 -Wall -O3 -I fast_io/inc +lude rtoa-pgatram-allinone3.cpp // Uncomment to use the fast_io C++ library #define USE_FAST_IO_L 1 #include <cctype> #include <cstring> #include <string> #include <numeric> #include <chrono> #include <thread> #include <iostream> #include <iomanip> #ifdef USE_FAST_IO_L // See [id://11149504] for more info on the fast_io C++ library #include <fast_io.h> #include <fast_io_legacy.h> #else #include <fstream> #include <sstream> #endif // --------------------------------------------------------------- typedef std::chrono::high_resolution_clock high_resolution_clock; typedef std::chrono::high_resolution_clock::time_point time_point; typedef std::chrono::milliseconds milliseconds; double elaspe_time( time_point cend, time_point cstart) { return double ( std::chrono::duration_cast<milliseconds>(cend - cstart).count() ) * 1e-3; } // --------------------------------------------------------------- // Though there are less than 256 initializers in this ascii table, // the others are guaranteed by ANSI C to be initialized to zero. static const int romtab[256] = { 0,0,0,0,0,0, 0, 0, 0, 0, // 00- 09 0,0,0,0,0,0, 0, 0, 0, 0, // 10- 19 0,0,0,0,0,0, 0, 0, 0, 0, // 20- 29 0,0,0,0,0,0, 0, 0, 0, 0, // 30- 39 0,0,0,0,0,0, 0, 0, 0, 0, // 40- 49 0,0,0,0,0,0, 0, 0, 0, 0, // 50- 59 0,0,0,0,0,0, 0, 100, 500, 0, // 60- 69 0,0,0,1,0,0, 50,1000, 0, 0, // 70- 79 0,0,0,0,0,0, 5, 0, 10, 0, // 80- 89 0,0,0,0,0,0, 0, 0, 0, 100, // 90- 99 500,0,0,0,0,1, 0, 0, 50,1000, // 100-109 0,0,0,0,0,0, 0, 0, 5, 0, // 110-119 10,0,0,0,0,0, 0, 0, 0, 0 // 120-129 }; // Return the arabic number for a roman letter c. // Return zero if the roman letter c is invalid. inline int urtoa(int c) { return romtab[c]; } inline int accfn(int t, char c) { return t + urtoa(c) - t % urtoa(c) * 2; } inline int roman_to_dec(std::string_view st) { return std::accumulate(st.begin(), st.end(), 0, accfn); } #ifdef USE_FAST_IO_L // Read an input file of Roman Numerals and do it all (fast_io version +) static void do_it_all( std::string_view fname // in: file name containing a list of + Roman Numerals ) { try { fast_io::filebuf_file fbf(fname, fast_io::open_mode::in); for (std::string line; fast_io::io::scan<true>(fbf, fast_io::mnp +::line_get(line)); ) { fast_io::io::println(roman_to_dec(line)); } } catch (fast_io::error e) { fast_io::io::perrln("Error opening '", fname, "' : ", e); }; } #else // Read an input file of Roman Numerals and do it all (std C++ version +) static void do_it_all( const char* fname // in: file name containing a list of Roma +n Numerals ) { std::ifstream infile(fname); if (!infile.is_open()) { std::cerr << "Error opening '" << fname << "'\n"; return; } for (std::string line; std::getline(infile, line); ) { std::cout << roman_to_dec(line) << "\n"; } infile.close(); } #endif int main(int argc, char* argv[]) { if (argc < 2) { if (argc > 0) std::cerr << "usage: rtoa-pgatram-allinone3 file... >out.txt\ +n"; return 1; } #ifdef USE_FAST_IO_L std::cerr << "using fast_io\n"; #else std::cerr << "not using fast_io\n"; #endif // Get the list of input files from the command line int nfiles = argc - 1; char** fname = &argv[1]; std::cerr << std::setprecision(3) << std::setiosflags(std::ios::fix +ed); time_point cstartall, cendall; cstartall = high_resolution_clock::now(); for (int i = 0; i < nfiles; ++i) do_it_all( fname[i] ); cendall = high_resolution_clock::now(); double ctakenall = elaspe_time(cendall, cstartall); std::cerr << "do_it_all time : " << std::setw(8) << ctakenall << + " secs\n"; return 0; }

        > Sadly, the documentation for fast_io seems to be non-existent and I always struggle to figure out how to use it.

        Same here. This took a team. I've been grepping for clues inside the fast_io benchmark and examples folders. To break 0.5 seconds, I updated the allinone2 demonstration with fast_io memory mapping.

        # https://perlmonks.org/?node_id=11152186 $ ./rtoa-pgatram-allinone2 t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.515 secs fast_io scan, line_get do_it_all time : 0.490 secs fast_io memory mapping 737201628 75552000

        Memory mapping update:

        // Read an input file of Roman Numerals and do it all static void do_it_all( std::string_view fname // in: file name containing a list of + Roman Numerals ) { try { #if 1 // Load entire file to memory through memory mapping. using file_loader_type = fast_io::native_file_loader; file_loader_type loader(fname, fast_io::open_mode::in | fast_io: +:open_mode::follow); // Loop through contiguous container of the file. for (char const *first{loader.data()}, *last{loader.data()+loade +r.size()}; first!=last; ) { auto start_ptr{first}; first = fast_io::find_lf(first, last); + auto end_ptr{first}; if (start_ptr == end_ptr) continue; int dec = roman_to_dec(std::string_view(start_ptr, end_ptr - +start_ptr)); fast_io::io::println(dec); ++first; } #else fast_io::filebuf_file fbf(fname, fast_io::open_mode::in | fast_i +o::open_mode::follow); for (std::string line; fast_io::io::scan<true>(fbf, fast_io::mnp +::line_get(line)); ) { fast_io::io::println(roman_to_dec(line)); } #endif } catch (fast_io::error e) { fast_io::io::perrln("Error opening '", fname, "' : ", e); }; }

        It requires running Perl on Clear Linux to keep up :)

        # https://perlmonks.org/?node_id=11152186 $ time ./rtoa-pgatram-allinone2 t1.txt t1.txt t1.txt t1.txt | cksum do_it_all time : 0.490 secs fast_io memory mapping 737201628 75552000 real 0m0.492s user 0m0.471s sys 0m0.037s # https://perlmonks.org/?node_id=11152168 max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.480 secs 737201628 75552000 real 0m0.504s user 0m13.887s sys 0m0.231s # https://perlmonks.org/?node_id=11152168 max_workers => 64 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt | cksum rtoa pgatram start time 0.425 secs 737201628 75552000 real 0m0.449s user 0m21.884s sys 0m0.592s

        Perl MCE is mind-boggling to me. Because, the UNIX time includes all of it; the time to launch Perl, load modules, spawn workers, notify workers per each input file / run, and finally reap workers. The overhead is 0.075 and 0.093 seconds for 32 and 64 workers, respectively. The overhead also includes MCE::Relay where workers must wait their turn orderly by chunk_id value, behind the scene.

        I commented out a few lines of code to measure the overhead time running MCE including MCE::Relay.

        user_func => sub { my ( $mce, $slurp_ref, $chunk_id, $output ) = ( @_, '' ); # open my $fh, '<', $slurp_ref; # while ( <$fh> ) { # chomp; # my $n = 0; # $n += $_ - $n % $_ * 2 for @rtoa[ unpack 'c*', $_ ]; # $output .= "$n\n"; # } # close $fh; # output orderly MCE::relay { # print $output; }; }

        How much time does Perl MCE chunking four input files including MCE::Relay take, factoring out compute and output? Notice how doubling up on workers only adds less than 2 hundreds of a second to the UNIX real time. But notice the user and sys times. Chunking an input file and involving relay occur simultaneously behind the scene. The relay block runs serially.

        # max_workers => 32 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt rtoa pgatram start time 0.052 secs real 0m0.075s user 0m0.122s sys 0m0.170s # max_workers => 64 $ time perl rtoa-pgatram-mce.pl t1.txt t1.txt t1.txt t1.txt rtoa pgatram start time 0.070 secs real 0m0.093s user 0m0.195s sys 0m0.441s