in reply to Re^2: Rosetta Code: Long List is Long (faster - vec)
in thread Rosetta Code: Long List is Long
Thanks to the excellent feedback from marioroy I've been able to improve the code as follows:
New Timings for Long String Version
> ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.06708 secs emplace set sort CPU time : 0.903617 secs write stdout CPU time : 1.23459 secs total CPU time : 5.20544 secs total wall clock time : 5 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.4832 secs emplace set sort CPU time : 1.10209 secs write stdout CPU time : 0.939805 secs total CPU time : 5.52519 secs total wall clock time : 6 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec start get_properties CPU time : 3.42605 secs emplace set sort CPU time : 0.916222 secs write stdout CPU time : 1.00049 secs total CPU time : 5.343 secs total wall clock time : 5 secs > diff f.tmp vec.tmp
This is an average time of 5.35 secs (compared to 5.5 secs in the original post).
New Timings for Short String (MAX_STR_LEN_L=6) Version
> For some reason, short string version is faster when compiled with c +lang++ > (while long string version is not) > clang++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.87217 secs emplace set sort CPU time : 0.589238 secs write stdout CPU time : 0.842179 secs total CPU time : 3.30369 secs total wall clock time : 3 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.95909 secs emplace set sort CPU time : 0.610479 secs write stdout CPU time : 0.959859 secs total CPU time : 3.52965 secs total wall clock time : 4 secs > ./llil2vec big1.txt big2.txt big3.txt >f.tmp llil2vec (fixed string length=6) start get_properties CPU time : 1.86549 secs emplace set sort CPU time : 0.608097 secs write stdout CPU time : 0.857176 secs total CPU time : 3.33087 secs total wall clock time : 3 secs > diff f.tmp vec.tmp
This is an average time of 3.4 secs (compared to 4.3 secs in the original post).
New version of llil2vec.cpp
// llil2vec.cpp. // Vector version of llil2grt.pl. // g++ compile on Linux: // g++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil2vec big1.txt big2.txt big3.txt >vec.tmp // Experiment with fast_io.h (see [id://11149504] by [marioroy]) #if 0 #include <fast_io.h> #else #include <cstdio> #endif #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <string> #include <array> #include <vector> #include <set> #include <algorithm> #include <utility> #include <iterator> #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; // Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l +ength // To use (limited length) fixed length strings uncomment the next lin +e // big.txt max word length is 6 // long.txt max word length is 208 // Based on rough benchmarking, this short fixed string hack // is only worth trying for MAX_STR_LEN_L up to about 64 // #define MAX_STR_LEN_L 6 #ifdef MAX_STR_LEN_L using str_type = std::array<char, MAX_STR_LEN_L+1>; #else using str_type = std::string; #endif using str_int_type = std::pair<str_type, llil_int_type>; using int_str_type = std::pair<llil_int_type, str_type>; using vec_str_int_type = std::vector<str_int_type>; using vec_int_str_type = std::vector<int_str_type>; using set_int_str_type = std::set<int_str_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use ANSI C functions to try to boost performa +nce #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names vec_int_str_type& vec_ret) // out: a vector of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* word; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { word = ::strtok(line, "\t"); count = ::atoll( ::strtok(NULL, "\n") ); #ifdef MAX_STR_LEN_L // str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0', ' +\0' } }; str_type fixword; ::memset( fixword.data(), '\0', MAX_STR_LEN_L+1 ); ::memcpy( fixword.data(), word, strlen(word) ); vec_ret.emplace_back( -count, fixword ); #else vec_ret.emplace_back( -count, word ); #endif } ::fclose(fh); } // Needs to be sorted by word for later sum of adjacent count field +s to work std::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil2vec file1 file2 ... >out.txt\n"; return 1; } #ifdef MAX_STR_LEN_L std::cerr << "llil2vec (fixed string length=" << MAX_STR_LEN_L << " +) start\n"; #else std::cerr << "llil2vec start\n"; #endif time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the vector of properties vec_int_str_type vec_ret; get_properties(argc - 1, &argv[1], vec_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // To avoid calling sort(), create an inverted std::set container // Note: negative count gives desired ordering set_int_str_type myset; auto it = vec_ret.cbegin(); str_type name_last = it->second; llil_int_type count = it->first; for (++it; it != vec_ret.cend(); ++it) { if ( it->second == name_last ) { count += it->first; } else { myset.emplace_hint( myset.end(), count, name_last ); name_last = it->second; count = it->first; } } myset.emplace_hint( myset.end(), count, name_last ); clock_t cend2s = ::clock(); // Output the (already sorted) std::set - no sort() function requir +ed // Note: fix up negative count via -n.first #ifdef MAX_STR_LEN_L // If name is not NULL-terminated: // for ( auto const& n : myset ) ::printf( "%.*s\t%lld\n", MAX_ST +R_LEN_L, n.second.data(), -n.first ); // If name is NULL-terminated: // for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second +.data(), -n.first ); for ( auto const& n : myset ) std::cout << n.second.data() << '\t' +<< -n.first << '\n'; #else // for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second +.c_str(), -n.first ); for ( auto const& n : myset ) std::cout << n.second << '\t' << -n.f +irst << '\n'; #endif clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "emplace set sort CPU time : " << ctaken2s << " sec +s\n"; std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " sec +s\n"; std::cerr << "total wall clock time : " << ttaken << " sec +s\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; }
Interim OpenMP Versions
Note: The interim versions below are just early (conservative) openmp-safe versions made thread-safe by expunging the dreaded strtok function.
The one with the most potential for multi-threading seems to be llil3vec.cpp because it uses vectors which can be safely used lock-free in thread-local vec_loc vectors, as shown by marioroy in llil4vec.cpp.
While replacing std::sort and std::stable_sort with __gnu_parallel versions to sort std::vectors is a no-brainer, it seems problematic to parallelise access to a global std::map (as used in llil3grt.cpp and llil3m.cpp below) because I think you'd need some sort of locking to safely allow multiple threads update a shared global std::map. Ideas welcome.
Interim version of llil3vec.cpp
// llil3vec.cpp. // Vector version of llil2grt.pl. // g++ compile on Linux: // g++ -o llil3vec -std=c++20 -Wall -O3 llil3vec.cpp // g++ -o llil3vec -std=c++20 -fopenmp -Wall -O3 llil3vec.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil3vec big1.txt big2.txt big3.txt >vec.tmp #include <cstdio> #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <string> #include <array> #include <vector> #include <set> #include <utility> #include <iterator> #if defined(_OPENMP) #include <omp.h> #include <parallel/algorithm> #else #include <algorithm> #include <execution> #endif #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; // Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l +ength // To use (limited length) fixed length strings uncomment the next lin +e // big.txt max word length is 6 // long.txt max word length is 208 // Based on rough benchmarking, this short fixed string hack // is only worth trying for MAX_STR_LEN_L up to about 64 #define MAX_STR_LEN_L 6 #ifdef MAX_STR_LEN_L using str_type = std::array<char, MAX_STR_LEN_L>; #else using str_type = std::string; #endif using str_int_type = std::pair<str_type, llil_int_type>; using int_str_type = std::pair<llil_int_type, str_type>; using vec_str_int_type = std::vector<str_int_type>; using vec_int_str_type = std::vector<int_str_type>; using set_int_str_type = std::set<int_str_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use ANSI C functions to try to boost performa +nce #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names vec_int_str_type& vec_ret) // out: a vector of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* found; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { found = ::strchr(line, '\t'); count = ::atoll( &line[found - line + 1] ); line[found - line] = '\0'; // word #ifdef MAX_STR_LEN_L str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0' } }; ::memcpy( fixword.data(), line, found - line ); vec_ret.emplace_back( -count, fixword ); #else vec_ret.emplace_back( -count, line ); #endif } ::fclose(fh); } // Needs to be sorted by word for later sum of adjacent count field +s to work #if defined(_OPENMP) __gnu_parallel::sort( #else std::sort( // std::sort(std::execution::par_unseq, #endif vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil3vec file1 file2 ... >out.txt\n"; return 1; } #ifdef MAX_STR_LEN_L std::cerr << "llil3vec (fixed string length=" << MAX_STR_LEN_L << " +) start\n"; #else std::cerr << "llil3vec start\n"; #endif time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the vector of properties vec_int_str_type vec_ret; get_properties(argc - 1, &argv[1], vec_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // To avoid calling sort(), create an inverted std::set container // Note: negative count gives desired ordering set_int_str_type myset; auto it = vec_ret.cbegin(); str_type name_last = it->second; llil_int_type count = it->first; for (++it; it != vec_ret.cend(); ++it) { if ( it->second == name_last ) { count += it->first; } else { myset.emplace_hint( myset.end(), count, name_last ); name_last = it->second; count = it->first; } } myset.emplace_hint( myset.end(), count, name_last ); clock_t cend2s = ::clock(); // Output the (already sorted) std::set - no sort() function requir +ed // Note: fix up negative count via -n.first #ifdef MAX_STR_LEN_L // If name is not NULL-terminated: for ( auto const& n : myset ) ::printf( "%.*s\t%lld\n", MAX_STR_LEN +_L, n.second.data(), -n.first ); // If name is NULL-terminated: // for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second.d +ata(), -n.first ); // for ( auto const& n : myset ) std::cout << n.second.data() << '\ +t' << -n.first << '\n'; #else // Can try printf vs std::cout to see which is faster for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second.c_st +r(), -n.first ); // for ( auto const& n : myset ) std::cout << n.second << '\t' << - +n.first << '\n'; #endif clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "emplace set sort CPU time : " << ctaken2s << " sec +s\n"; std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " sec +s\n"; std::cerr << "total wall clock time : " << ttaken << " sec +s\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; }
Timings of llil3vec
> g++ -o llil3vec -std=c++20 -Wall -O3 llil3vec.cpp > time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 1.47895 secs emplace set sort CPU time : 0.592783 secs write stdout CPU time : 0.832071 secs total CPU time : 2.90392 secs total wall clock time : 3 secs real 0m3.217s user 0m2.806s sys 0m0.412s > diff f.tmp vec.tmp > g++ -o llil3vec -std=c++20 -fopenmp -Wall -O3 llil3vec.cpp > time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 2.5809 secs emplace set sort CPU time : 0.793355 secs write stdout CPU time : 0.860855 secs total CPU time : 4.23521 secs total wall clock time : 3 secs real 0m2.673s user 0m4.073s sys 0m0.465s > diff f.tmp vec.tmp
For the -fopenmp version note that the Unix real time is lower, but the user time is higher.
Interim version of llil3grt.cpp
// llil3grt.cpp. // Inspired by llilgrt.pl: improve sort performance via a negative cou +nt. // g++ compile on Linux: // g++ -o llil3grt -std=c++20 -Wall -O3 llil3grt.cpp // g++ -o llil3grt -std=c++20 -fopenmp -Wall -O3 llil3grt.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil3grt tt1.txt tt2.txt tt3.txt >out.txt #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <cstdio> #include <string> #include <array> #include <vector> #include <set> #include <map> #include <algorithm> #include <utility> #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; // Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l +ength // To use (limited length) fixed length strings uncomment the next lin +e // big.txt max word length is 6 // long.txt max word length is 208 // Based on rough benchmarking, this short fixed string hack // is only worth trying for MAX_STR_LEN_L up to about 64 #define MAX_STR_LEN_L 6 #ifdef MAX_STR_LEN_L using str_type = std::array<char, MAX_STR_LEN_L>; #else using str_type = std::string; #endif using str_int_type = std::pair<str_type, llil_int_type>; using int_str_type = std::pair<llil_int_type, str_type>; using vec_str_int_type = std::vector<str_int_type>; using vec_int_str_type = std::vector<int_str_type>; using set_int_str_type = std::set<int_str_type>; using map_str_int_type = std::map<str_type, llil_int_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use ANSI C functions to try to boost performa +nce #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names map_str_int_type& hash_ret) // out: a hash of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* found; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { found = ::strchr(line, '\t'); count = ::atoll( &line[found - line + 1] ); line[found - line] = '\0'; // word #ifdef MAX_STR_LEN_L str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0' } }; ::memcpy( fixword.data(), line, found - line ); hash_ret[fixword] -= count; #else hash_ret[line] -= count; #endif } ::fclose(fh); } } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil3grt file1 file2 ... >out.txt\n"; return 1; } #ifdef MAX_STR_LEN_L std::cerr << "llil3grt (fixed string length=" << MAX_STR_LEN_L << " +) start\n"; #else std::cerr << "llil3grt start\n"; #endif #if defined(_OPENMP) std::cerr << " - openmp version\n"; #endif time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the hash of properties map_str_int_type hash_ret; get_properties(argc - 1, &argv[1], hash_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // To avoid calling sort(), try creating an inverted std::set conta +iner // Note: negative count gives desired ordering (just like Perl GRT +sort :) set_int_str_type myset; for ( auto const& kv : hash_ret ) { // Note: emplace_hint() was a lot faster than emplace() myset.emplace_hint( myset.end(), kv.second, kv.first ); } clock_t cend2s = ::clock(); // Output the (already sorted) std::set - no sort() function requir +ed! // Note: fix up negative count via -n.first #ifdef MAX_STR_LEN_L // If name is not NULL-terminated: for ( auto const& n : myset ) ::printf( "%.*s\t%lld\n", MAX_STR_LEN +_L, n.second.data(), -n.first ); // If name is NULL-terminated: // for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second.d +ata(), -n.first ); // for ( auto const& n : myset ) std::cout << n.second.data() << '\ +t' << -n.first << '\n'; #else // Can try printf vs std::cout to see which is faster // for ( auto const& n : myset ) ::printf( "%s\t%lld\n", n.second.c +_str(), -n.first ); for ( auto const& n : myset ) std::cout << n.second << '\t' << -n.f +irst << '\n'; #endif clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "emplace set sort CPU time : " << ctaken2s << " sec +s\n"; std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " sec +s\n"; std::cerr << "total wall clock time : " << ttaken << " sec +s\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; }
Interim version of llil3m.cpp
// llil3m.cpp. C++ 11 version of Perl llil.pl. // C++ version inspired by Mario's famous dualvar two-sort Perl soluti +on. // g++ compile on Linux: // g++ -o llil3m -std=c++20 -Wall -O3 llil3m.cpp // g++ -o llil3m -std=c++20 -fopenmp -Wall -O3 llil3m.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil3m tt1.txt tt2.txt tt3.txt >out.txt // Uncomment next line to mimic marioroy two sort trick (needs stable +sort) #define MARIOROY_TWO_SORT_TRICK_L 1 #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <cstdio> #include <string> #include <array> #include <vector> #include <set> #include <map> #include <utility> #include <iterator> #if defined(_OPENMP) #include <omp.h> #include <parallel/algorithm> #else #include <algorithm> #include <execution> #endif #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // For some performance hacks to speed up C++ I/O see: // https://www.reddit.com/r/rust/comments/9xedap/how_to_achieve_fast +_stdinstdout_io_suitable_for/ // The only one we use here is to prefer "\n" to std::endl to reduce s +tdout flushing // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; // Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l +ength // To use (limited length) fixed length strings uncomment the next lin +e // big.txt max word length is 6 // long.txt max word length is 208 // Based on rough benchmarking, this short fixed string hack // is only worth trying for MAX_STR_LEN_L up to about 64 #define MAX_STR_LEN_L 6 #ifdef MAX_STR_LEN_L using str_type = std::array<char, MAX_STR_LEN_L>; #else using str_type = std::string; #endif using str_int_type = std::pair<str_type, llil_int_type>; using int_str_type = std::pair<llil_int_type, str_type>; using vec_str_int_type = std::vector<str_int_type>; using vec_int_str_type = std::vector<int_str_type>; using set_int_str_type = std::set<int_str_type>; using map_str_int_type = std::map<str_type, llil_int_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use lower level ANSI C functions to try to bo +ost I/O performance #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names map_str_int_type& hash_ret) // out: a hash of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* found; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { found = ::strchr(line, '\t'); count = ::atoll( &line[found - line + 1] ); line[found - line] = '\0'; // word #ifdef MAX_STR_LEN_L str_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0' } }; ::memcpy( fixword.data(), line, found - line ); hash_ret[fixword] -= count; #else hash_ret[line] -= count; #endif } ::fclose(fh); } } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil3m file1 file2 ... >out.txt\n"; return 1; } #ifdef MAX_STR_LEN_L std::cerr << "llil3m (fixed string length=" << MAX_STR_LEN_L << ") +start\n"; #else std::cerr << "llil3m start\n"; #endif #if defined(_OPENMP) std::cerr << " - openmp version\n"; #endif time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the hash of properties map_str_int_type hash_ret; get_properties(argc - 1, &argv[1], hash_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // Create a vector to sort (can't sort a std::map directly) vec_str_int_type v( hash_ret.begin(), hash_ret.end() ); clock_t cend2v = ::clock(); #ifdef MARIOROY_TWO_SORT_TRICK_L // std::map is already sorted by string key so just stable_sort by +value #if defined(_OPENMP) __gnu_parallel::stable_sort( #else std::stable_sort( #endif v.begin(), v.end(), [](const str_int_type& left, const str_int_type& right) { return + left.second < right.second; } ); #else // Sort descending by value, i.e. mimic this Perl code in C++: // sort { $href->{$b} <=> $href->{$a} || $a cmp $b } keys %{$href +} // No surprise that string key comparison is way slower than numeri +c key comparison in std::sort() #if defined(_OPENMP) __gnu_parallel::sort( #else std::sort( #endif v.begin(), v.end(), [](const str_int_type& left, const str_int_type& right) { return + left.second != right.second ? left.second < right.second : left.firs +t < right.first; } ); #endif clock_t cend2s = ::clock(); #ifdef MAX_STR_LEN_L // If name is not NULL-terminated: for ( auto const& n : v ) ::printf( "%.*s\t%lld\n", MAX_STR_LEN_L, +n.first.data(), -n.second ); // If name is NULL-terminated: // for ( auto const& n : v ) ::printf( "%s\t%lld\n", n.first.data() +, -n.second ); // for ( auto const& n : v ) std::cout << n.first.data() << '\t' << + -n.second << '\n'; #else // Can try printf vs std::cout to see which is faster for ( auto const& n : v ) ::printf( "%s\t%lld\n", n.first.c_str(), +-n.second ); // for ( auto const& n : v ) std::cout << n.first << '\t' << -n.sec +ond << '\n'; #endif clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2v = (double) (cend2v - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2s = (double) (cend2s - cend2v) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "vector copy CPU time : " << ctaken2v << " sec +s\n"; #ifdef MARIOROY_TWO_SORT_TRICK_L std::cerr << "vector stable sort CPU time : " << ctaken2s << " sec +s\n"; #else std::cerr << "vector sort CPU time : " << ctaken2s << " sec +s\n"; #endif std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " secs +\n"; std::cerr << "total wall clock : " << ttaken << " secs +\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; }
Timings
llil3grt (fixed string length=6) start - openmp version get_properties CPU time : 2.34487 secs emplace set sort CPU time : 0.942098 secs write stdout CPU time : 0.855157 secs total CPU time : 4.14223 secs total wall clock time : 4 secs real 0m4.752s user 0m4.241s sys 0m0.511s
$ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start - openmp version get_properties CPU time : 2.40089 secs vector copy CPU time : 0.544294 secs vector stable sort CPU time : 5.84981 secs write stdout CPU time : 0.805509 secs total CPU time : 9.6006 secs total wall clock : 4 secs real 0m4.713s user 0m9.238s sys 0m0.679s
llil3m (fixed string length=6) start - openmp version get_properties CPU time : 2.3031 secs vector copy CPU time : 0.544613 secs vector sort CPU time : 2.64047 secs write stdout CPU time : 0.791186 secs total CPU time : 6.27946 secs total wall clock : 4 secs real 0m4.182s user 0m6.128s sys 0m0.473s
$ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start get_properties CPU time : 2.28072 secs vector copy CPU time : 0.471632 secs vector stable sort CPU time : 0.735042 secs write stdout CPU time : 0.67514 secs total CPU time : 4.16263 secs total wall clock : 4 secs real 0m4.470s user 0m4.159s sys 0m0.311s $ time ./llil3m big1.txt big2.txt big3.txt >f.tmp llil3m (fixed string length=6) start get_properties CPU time : 2.30618 secs vector copy CPU time : 0.473185 secs vector sort CPU time : 1.13081 secs write stdout CPU time : 0.668702 secs total CPU time : 4.57897 secs total wall clock : 4 secs real 0m4.889s user 0m4.558s sys 0m0.331s $ time ./llil3vec big1.txt big2.txt big3.txt >f.tmp llil3vec (fixed string length=6) start get_properties CPU time : 1.46864 secs emplace set sort CPU time : 0.630914 secs write stdout CPU time : 0.847912 secs total CPU time : 2.9476 secs total wall clock time : 3 secs real 0m3.233s user 0m2.852s sys 0m0.381s $ time ./llil3grt big1.txt big2.txt big3.txt >f.tmp llil3grt (fixed string length=6) start get_properties CPU time : 2.34418 secs emplace set sort CPU time : 0.901415 secs write stdout CPU time : 0.90025 secs total CPU time : 4.14595 secs total wall clock time : 5 secs real 0m4.784s user 0m4.232s sys 0m0.551s
References Added Later
Updated: The source code for llil2vec.cpp was updated after the node was first created based on feedback from the replies. Added interim version llil3vec.cpp (with some marioroy improvements added). Plus interim versions of llil3grt.cpp and llil3m.cpp. Added a few missing #include <array> and reference for OpenMP Little Book (thanks marioroy).
|
---|