comment on

J script runs in ~5.7 sec using ~650 MB to produce exactly same output. Which makes it fastest of all solutions so far.

The good news is that I've now found two different C++ versions that are faster. The bad news is that they're much uglier than my original llil2grt.cpp (which ran in about 6.2 seconds on Ubuntu). After getting nowhere trying to speed up llil2grt.cpp, I reluctantly decided that the simple and elegant hash_ret[word] -= count had to go.

First the timings on Ubuntu. Because there's quite a bit of natural variation between runs, I did three runs of each:

llil2vec start
get_properties      CPU time : 3.06313 secs
emplace set sort    CPU time : 0.923435 secs
write stdout        CPU time : 1.392 secs
total               CPU time : 5.37868 secs
total        wall clock time : 6 secs

llil2vec start
get_properties      CPU time : 3.1567 secs
emplace set sort    CPU time : 0.970294 secs
write stdout        CPU time : 1.22305 secs
total               CPU time : 5.35015 secs
total        wall clock time : 5 secs

llil2vec start
get_properties      CPU time : 3.32019 secs
emplace set sort    CPU time : 1.08277 secs
write stdout        CPU time : 1.22461 secs
total               CPU time : 5.62766 secs
total        wall clock time : 5 secs
[download]

Ave CPU time: 5.5 secs
(Memory use (Windows Private Bytes): 1,225,580K)
[download]

llil2vec (fixed string length=6) start
get_properties      CPU time : 2.09353 secs
emplace set sort    CPU time : 0.795144 secs
write stdout        CPU time : 1.20994 secs
total               CPU time : 4.09871 secs
total        wall clock time : 4 secs

llil2vec (fixed string length=6) start
get_properties      CPU time : 2.2078 secs
emplace set sort    CPU time : 0.707252 secs
write stdout        CPU time : 1.14867 secs
total               CPU time : 4.06383 secs
total        wall clock time : 4 secs

llil2vec (fixed string length=6) start
get_properties      CPU time : 2.39225 secs
emplace set sort    CPU time : 1.0033 secs
write stdout        CPU time : 1.22765 secs
total               CPU time : 4.62331 secs
total        wall clock time : 4 secs
[download]

Ave CPU time: 4.3 secs
(Memory use (Windows Private Bytes): 814,940K)
[download]

The first one still uses std::string so there are no limitations on word length. The second one is limited to a word length of six characters, and so works with big1.txt, while crashing spectacularly with long1.txt.

Funny, I'd originally forgotten that way back in 2014, when desperately trying to make a search program run a hundred times faster, I'd stumbled upon this classic quote:

That's not a minor disadvantage, we're talking about things being 50 to 100 times slower with a linked list. Compactness matters. Vectors are more compact than lists. And predictable usage patterns matter enormously. With a vector, you have to shove a lot of elements over, but caches are really really good at that.

When you traverse lists you keep doing random access ... so you are actually random accessing your memory and you are maximizing your cache misses, which is exactly the opposite of what you want.

-- Bjarne Stroustrup: Why you should avoid Linked Lists (3:30-5:00) (update: see also)

Back then, because each memory access into my (huge) 4GB lookup tables was essentially random, most memory accesses missed the L1 cache, missed the L2 cache, missed the L3 cache, then waited for the cache line to be loaded into all three caches from main memory, while often incurring a TLB cache miss, just to rub salt into the wounds.

Thankfully, there are no 4 GB lookup tables in this problem. But hashes (and linked lists) still tend to be mind-bogglingly slower than vectors on modern hardware, as indicated by Stroustrup's quote above. So I reluctantly decided that the simple and elegant hash_ret[word] -= count had to go.

Though I really hate this new std::vector based solution, it does run faster than my earlier std::map based one. This is just a first attempt, further improvements are possible. The llil2vec.cpp source code is shown below. Though there are two sets of timings, there is only one program, compiled with or without MAX_STR_LEN_L defined. Suggestions for improvements welcome.

llil2vec.cpp

// llil2vec.cpp.
// Vector version of llil2grt.pl.
// g++ compile on Linux:
//    g++ -o llil2vec -std=c++11 -Wall -O3 llil2vec.cpp
// This g++ command also works with mingw C++ compiler (https://source
+forge.net/projects/mingw-w64)
// that comes bundled with Strawberry Perl (C:\Strawberry\c\bin\g++.ex
+e).
// Example run: llil2vec big1.txt big2.txt big3.txt >vec.tmp

#include <cstddef>
#include <cstdint>
#include <cstdlib>
#include <cstring>
#include <ctime>
#include <cstdio>

#include <string>
#include <array>
#include <vector>
#include <set>

#include <algorithm>
#include <utility>
#include <iterator>

#include <iostream>
#include <fstream>
#include <sstream>

static_assert(sizeof(size_t) == sizeof(int64_t), "size_t too small, ne
+ed a 64-bit compile");

// -------------------------------------------------------------------
+---------

// Crude hack to see Windows Private Bytes in Task Manager by sleeping
+ at
// program end (see also sleep hack at end of main)
//    #include <chrono>
//    #include <thread>

// -------------------------------------------------------------------
+---------

typedef long long llil_int_type;

// Note: all words in big1.txt, big2.txt, big3.txt are <= 6 chars in l
+ength
// To use (limited length) fixed length strings uncomment the next lin
+e
// #define MAX_STR_LEN_L 6

#ifdef MAX_STR_LEN_L
using str_arr_type     = std::array<char, MAX_STR_LEN_L + 1>;    // +1
+ for trailing '\0'
using str_int_type     = std::pair<str_arr_type, llil_int_type>;
using int_str_type     = std::pair<llil_int_type, str_arr_type>;
#else
using str_int_type     = std::pair<std::string, llil_int_type>;
using int_str_type     = std::pair<llil_int_type, std::string>;
#endif

using vec_str_int_type = std::vector<str_int_type>;
using vec_int_str_type = std::vector<int_str_type>;
using set_int_str_type = std::set<int_str_type>;

// Mimic the Perl get_properties subroutine --------------------------
+--

// Limit line length and use ANSI C functions to try to boost performa
+nce
#define MAX_LINE_LEN_L 255

static void get_properties(
   int                nfiles,      //  in: the number of input files
   char*              fname[],     //  in: the input file names
   vec_int_str_type&  vec_ret)     // out: a vector of properties
{
   FILE* fh;
   char line[MAX_LINE_LEN_L+1];
   char* word;
   llil_int_type count;

   for (int i = 0; i < nfiles; ++i) {
      fh = ::fopen(fname[i], "r");
      if (fh == NULL) {
         std::cerr << "Error opening '" << fname[i] << "' : errno=" <<
+ errno << "\n";
         continue;
      }
      while ( ::fgets(line, MAX_LINE_LEN_L, fh) != NULL ) {
         word  = ::strtok(line, "\t");
         count = ::atoll( ::strtok(NULL, "\n") );
#ifdef MAX_STR_LEN_L
         str_arr_type fixword { { '\0', '\0', '\0', '\0', '\0', '\0', 
+'\0' } };
         ::strcpy( fixword.data(), word );
         vec_ret.emplace_back( -count, fixword );
#else
         vec_ret.emplace_back( -count, word );
#endif
      }
      ::fclose(fh);
   }

   // Needs to be sorted by word for later sum of adjacent count field
+s to work
   std::sort( vec_ret.begin(), vec_ret.end(),
      [](const int_str_type& left, const int_str_type& right) { return
+ left.second < right.second; }
   );
}

// -------------------------------------------------------------------
+--

int main(int argc, char* argv[])
{
   if (argc < 2) {
      std::cerr << "usage: llil2vec file1 file2 ... >out.txt\n";
      return 1;
   }

#ifdef MAX_STR_LEN_L
   std::cerr << "llil2vec (fixed string length=" << MAX_STR_LEN_L << "
+) start\n";
#else
   std::cerr << "llil2vec start\n";
#endif
   time_t  tstart1 = ::time(NULL);
   clock_t cstart1 = ::clock();

   // Create the vector of properties
   vec_int_str_type vec_ret;
   get_properties(argc - 1, &argv[1], vec_ret);

   clock_t cend1  = ::clock();
   double ctaken1 = (double) (cend1 - cstart1) / (double)CLOCKS_PER_SE
+C;
   std::cerr << "get_properties      CPU time : " << ctaken1 << " secs
+\n";

   clock_t cstart2 = ::clock();

   // To avoid calling sort(), create an inverted std::set container
   // Note: negative count gives desired ordering
   set_int_str_type myset;
   auto it = vec_ret.begin();
   int_str_type  kv_last = *it;
   llil_int_type count   = it->first;
   for (++it; it != vec_ret.end(); ++it) {
      if ( it->second == kv_last.second ) {
         count += it->first;
      }
      else {
         myset.emplace_hint( myset.end(), count, kv_last.second );
         kv_last = *it;
         count   = it->first;
      }
   }
   myset.emplace_hint( myset.end(), count, kv_last.second );

   clock_t cend2s = ::clock();

   // Output the (already sorted) std::set - no sort() function requir
+ed
   // Note: fix up negative count via -n.first
#ifdef MAX_STR_LEN_L
   for ( auto const& n : myset ) std::cout << n.second.data() << '\t' 
+<< -n.first << '\n';
#else
   for ( auto const& n : myset ) std::cout << n.second        << '\t' 
+<< -n.first << '\n';
#endif

   clock_t cend2 = ::clock();
   time_t  tend2 = ::time(NULL);

   long    ttaken   = static_cast<long>(::difftime(tend2, tstart1) + 0
+.5);
   double  ctaken   = (double) (cend2 - cstart1) / (double)CLOCKS_PER_
+SEC;

   double  ctaken2s = (double) (cend2s - cstart2) / (double)CLOCKS_PER
+_SEC;
   double  ctaken2o = (double) (cend2  - cend2s)  / (double)CLOCKS_PER
+_SEC;

   std::cerr << "emplace set sort    CPU time : " << ctaken2s << " sec
+s\n";
   std::cerr << "write stdout        CPU time : " << ctaken2o << " sec
+s\n";

   std::cerr << "total               CPU time : " << ctaken   << " sec
+s\n";
   std::cerr << "total        wall clock time : " << ttaken   << " sec
+s\n";

   // Hack to see Private Bytes in Windows Task Manager (uncomment nex
+t line so process doesn't exit too quickly)
   //   std::this_thread::sleep_for(std::chrono::milliseconds(90000000
+));

   return 0;
}
[download]

In reply to Re^2: Rosetta Code: Long List is Long (faster - vec) by eyepopslikeamosquito
in thread Rosetta Code: Long List is Long by eyepopslikeamosquito

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.