J script runs in ~5.7 sec using ~650 MB to produce exactly same output. Which makes it fastest of all solutions so far.
The good news is that I've now found two different C++ versions that are faster. The bad news is that they're much uglier than my original llil2grt.cpp (which ran in about 6.2 seconds on Ubuntu). After getting nowhere trying to speed up llil2grt.cpp, I reluctantly decided that the simple and elegant hash_ret[word] -= count had to go.
First the timings on Ubuntu. Because there's quite a bit of natural variation between runs, I did three runs of each:
llil2vec start get_properties CPU time : 3.06313 secs emplace set sort CPU time : 0.923435 secs write stdout CPU time : 1.392 secs total CPU time : 5.37868 secs total wall clock time : 6 secs llil2vec start get_properties CPU time : 3.1567 secs emplace set sort CPU time : 0.970294 secs write stdout CPU time : 1.22305 secs total CPU time : 5.35015 secs total wall clock time : 5 secs llil2vec start get_properties CPU time : 3.32019 secs emplace set sort CPU time : 1.08277 secs write stdout CPU time : 1.22461 secs total CPU time : 5.62766 secs total wall clock time : 5 secs
Ave CPU time: 5.5 secs (Memory use (Windows Private Bytes): 1,225,580K)
llil2vec (fixed string length=6) start get_properties CPU time : 2.09353 secs emplace set sort CPU time : 0.795144 secs write stdout CPU time : 1.20994 secs total CPU time : 4.09871 secs total wall clock time : 4 secs llil2vec (fixed string length=6) start get_properties CPU time : 2.2078 secs emplace set sort CPU time : 0.707252 secs write stdout CPU time : 1.14867 secs total CPU time : 4.06383 secs total wall clock time : 4 secs llil2vec (fixed string length=6) start get_properties CPU time : 2.39225 secs emplace set sort CPU time : 1.0033 secs write stdout CPU time : 1.22765 secs total CPU time : 4.62331 secs total wall clock time : 4 secs
Ave CPU time: 4.3 secs (Memory use (Windows Private Bytes): 814,940K)
The first one still uses std::string so there are no limitations on word length. The second one is limited to a word length of six characters, and so works with big1.txt, while crashing spectacularly with long1.txt.
Funny, I'd originally forgotten that way back in 2014, when desperately trying to make a search program run a hundred times faster, I'd stumbled upon this classic quote:
That's not a minor disadvantage, we're talking about things being 50 to 100 times slower with a linked list. Compactness matters. Vectors are more compact than lists. And predictable usage patterns matter enormously. With a vector, you have to shove a lot of elements over, but caches are really really good at that.
When you traverse lists you keep doing random access ... so you are actually random accessing your memory and you are maximizing your cache misses, which is exactly the opposite of what you want.
-- Bjarne Stroustrup: Why you should avoid Linked Lists (3:30-5:00) (update: see also)
Back then, because each memory access into my (huge) 4GB lookup tables was essentially random, most memory accesses missed the L1 cache, missed the L2 cache, missed the L3 cache, then waited for the cache line to be loaded into all three caches from main memory, while often incurring a TLB cache miss, just to rub salt into the wounds.
Thankfully, there are no 4 GB lookup tables in this problem. But hashes (and linked lists) still tend to be mind-bogglingly slower than vectors on modern hardware, as indicated by Stroustrup's quote above. So I reluctantly decided that the simple and elegant hash_ret[word] -= count had to go.
Though I really hate this new std::vector based solution, it does run faster than my earlier std::map based one. This is just a first attempt, further improvements are possible. The llil2vec.cpp source code is shown below. Though there are two sets of timings, there is only one program, compiled with or without MAX_STR_LEN_L defined. Suggestions for improvements welcome.
llil2vec.cpp
// llil2vec.cpp. // Vector version of llil2grt.pl. // g++ compile on Linux: // g++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp // This g++ command also works with mingw C++ compiler (https://source +forge.net/projects/mingw-w64) // that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex +e). // Example run: llil2vec big1.txt big2.txt big3.txt >vec.tmp #include <cstddef> #include <cstdint> #include <cstdlib> #include <cstring> #include <ctime> #include <cstdio> #include <string> #include <array> #include <vector> #include <set> #include <algorithm> #include <utility> #include <iterator> #include <iostream> #include <fstream> #include <sstream> static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne +ed a 64-bit compile"); // ------------------------------------------------------------------- +--------- // Crude hack to see Windows Private Bytes in Task Manager by sleeping + at // program end (see also sleep hack at end of main) // #include <chrono> // #include <thread> // ------------------------------------------------------------------- +--------- typedef long long llil_int_type; // Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l +ength // To use (limited length) fixed length strings uncomment the next lin +e // #define MAX_STR_LEN_L 6 #ifdef MAX_STR_LEN_L using str_arr_type = std::array<char, MAX_STR_LEN_L + 1>; // +1 + for trailing '\0' using str_int_type = std::pair<str_arr_type, llil_int_type>; using int_str_type = std::pair<llil_int_type, str_arr_type>; #else using str_int_type = std::pair<std::string, llil_int_type>; using int_str_type = std::pair<llil_int_type, std::string>; #endif using vec_str_int_type = std::vector<str_int_type>; using vec_int_str_type = std::vector<int_str_type>; using set_int_str_type = std::set<int_str_type>; // Mimic the Perl get_properties subroutine -------------------------- +-- // Limit line length and use ANSI C functions to try to boost performa +nce #define MAX_LINE_LEN_L 255 static void get_properties( int nfiles, // in: the number of input files char* fname[], // in: the input file names vec_int_str_type& vec_ret) // out: a vector of properties { FILE* fh; char line[MAX_LINE_LEN_L+1]; char* word; llil_int_type count; for (int i = 0; i < nfiles; ++i) { fh = ::fopen(fname[i], "r"); if (fh == NULL) { std::cerr << "Error opening '" << fname[i] << "' : errno=" << + errno << "\n"; continue; } while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) { word = ::strtok(line, "\t"); count = ::atoll( ::strtok(NULL, "\n") ); #ifdef MAX_STR_LEN_L str_arr_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0', +'\0' } }; ::strcpy( fixword.data(), word ); vec_ret.emplace_back( -count, fixword ); #else vec_ret.emplace_back( -count, word ); #endif } ::fclose(fh); } // Needs to be sorted by word for later sum of adjacent count field +s to work std::sort( vec_ret.begin(), vec_ret.end(), [](const int_str_type& left, const int_str_type& right) { return + left.second < right.second; } ); } // ------------------------------------------------------------------- +-- int main(int argc, char* argv[]) { if (argc < 2) { std::cerr << "usage: llil2vec file1 file2 ... >out.txt\n"; return 1; } #ifdef MAX_STR_LEN_L std::cerr << "llil2vec (fixed string length=" << MAX_STR_LEN_L << " +) start\n"; #else std::cerr << "llil2vec start\n"; #endif time_t tstart1 = ::time(NULL); clock_t cstart1 = ::clock(); // Create the vector of properties vec_int_str_type vec_ret; get_properties(argc - 1, &argv[1], vec_ret); clock_t cend1 = ::clock(); double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE +C; std::cerr << "get_properties CPU time : " << ctaken1 << " secs +\n"; clock_t cstart2 = ::clock(); // To avoid calling sort(), create an inverted std::set container // Note: negative count gives desired ordering set_int_str_type myset; auto it = vec_ret.begin(); int_str_type kv_last = *it; llil_int_type count = it->first; for (++it; it != vec_ret.end(); ++it) { if ( it->second == kv_last.second ) { count += it->first; } else { myset.emplace_hint( myset.end(), count, kv_last.second ); kv_last = *it; count = it->first; } } myset.emplace_hint( myset.end(), count, kv_last.second ); clock_t cend2s = ::clock(); // Output the (already sorted) std::set - no sort() function requir +ed // Note: fix up negative count via -n.first #ifdef MAX_STR_LEN_L for ( auto const& n : myset ) std::cout << n.second.data() << '\t' +<< -n.first << '\n'; #else for ( auto const& n : myset ) std::cout << n.second << '\t' +<< -n.first << '\n'; #endif clock_t cend2 = ::clock(); time_t tend2 = ::time(NULL); long ttaken = static_cast<long>(::difftime(tend2, tstart1) + 0 +.5); double ctaken = (double) (cend2 - cstart1) / (double)CLOCKS_PER_ +SEC; double ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER +_SEC; double ctaken2o = (double) (cend2 - cend2s) / (double)CLOCKS_PER +_SEC; std::cerr << "emplace set sort CPU time : " << ctaken2s << " sec +s\n"; std::cerr << "write stdout CPU time : " << ctaken2o << " sec +s\n"; std::cerr << "total CPU time : " << ctaken << " sec +s\n"; std::cerr << "total wall clock time : " << ttaken << " sec +s\n"; // Hack to see Private Bytes in Windows Task Manager (uncomment nex +t line so process doesn't exit too quickly) // std::this_thread::sleep_for(std::chrono::milliseconds(90000000 +)); return 0; }
In reply to Re^2: Rosetta Code: Long List is Long (faster - vec)
by eyepopslikeamosquito
in thread Rosetta Code: Long List is Long
by eyepopslikeamosquito
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |