Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^4: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel)

by Anonymous Monk
on Jan 10, 2023 at 17:54 UTC ( [id://11149507]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Rosetta Code: Long List is Long (faster - vec)(faster++, and now parallel)
in thread Rosetta Code: Long List is Long

What follows are mostly my attempts to describe and explain i.e. fair amount of prose, perhaps not worth a copper (insert name for local coin of lowest denomination).

First of all, thanks for warm words marioroy, I'm glad you liked this cute toy. I think you'll be particularly interested in parallelization, how they do it in J. And BTW the key to press is Ctrl-D, to "slurp" from STDIN.

Plus, to keep this node from being completely off-topic among Perl congregation -- some monks might also be interested in, that, if any FFI module for Perl is allowed to be used for work or play, it's very easy to execute J phrases, calling DLL and reading/writing packed Perl scalars as J data. It's been awhile since I had that fun, but, e.g., a link [1] to get started, it is about calling J from J REPL, but I remember those experiments were quite useful for learning. Whoever gets interested will easily find more tutorials.

In my 1st script, I actually posted code which is borderline buggy, and felt uneasy about it since previous year:

text =: , freads " 0 fn_in NB. wrong text =: ; < @ freads " 0 fn_in NB. fixed

What's right of comma in 1st line, reads into N x M character table (then comma converts it into 1D list), number of files by longest file size, with short files padded. If files were not same size, result would be erroneous. Further, if 100 files were 1 MB, and 101-st file is 1 GB, then this would end with "out of RAM" or similar error. Neither line is used in modified code, but with this off my chest, let's move on.

I noticed that very slow part of my solution was (and still is) reading and parsing. Moreover this trivial part (not subsequent sorting!) is main RAM consumer during script run. Whatever I tried didn't help much. Then I decided if that's so -- let's make it at least potentially useful for the future.

Instead of ad hoc code to consume strictly 2-column input, I wrote a function ("verb") to read from (well-formed) TSV file with any number of columns, with a "pattern" provided to describe how to read columns, as either text or numbers. In our case, pattern is obviously 0 1. To keep script interface consistent, this value is hard-coded, but should be e.g. CLI argument, of course.

Part of what is a "well-formed" TSV file for this verb, is that "text" doesn't contain ASCII-32 space character. You maybe noticed I commented-out PAD_CHAR definition, because changing it won't be very useful. Some more substantial changes would be required to allow ASCII-32 (and then to pad with ASCII-0, perhaps), because frame-fill for literal data type is space character, nothing else. I'm keeping the line commented-out as a reminder.

Which leads us to the essential question -- why keep textual data as padded N x M table, instead of keeping strings of text (they may be any different lengths then) each in their own "box"? Box in J is atomic data type and sort of pointer to whatever is inside. Data structures of any complexity are implemented with nested boxes.

The answer is -- because boxes are relatively expensive [2]. A few thousands (or 1e5 or so) are OK. An array of 10 million boxes (total number of lines in our dataset) easily eats a few GB of RAM and is very slow.

This also explains why I didn't use DSV addon [3] from standard library. For similar reasons (while I'm at explaining this stage of my route of decision making) I decided against symbols [4] (irreducible and shared GST (global symbol table)? Don't know what to think of it).

So, textual columns are parsed into padded 2D tables, but then I saw that slurping all files into single string prior to parsing is not required at all, if files are not too tiny. And indeed, RAM requirements for the script dropped significantly then. Further, parsing file by file makes obvious choice as to where to try to parallelize.

To finish with data input, the CRs are filtered out, in modified script, unconditionally, and, also unconditionally, NOT injected on output, in both OSes. And BTW, for fread and fwrite from standard library there exist their counterparts freads and fwrites to strip CRs. Don't know why, they are slow compared to manual filtering. The latter many seconds slower, not even used in original script. The modified script got rid of freads, also.

Moving on to how sorting was done in 1st script version -- actually with somewhat complicated and slow code, trying to sort word sub-ranges (per the same numeric count) without creating intermediate boxed arrays. In modified script, this is replaced with just a short simple phrase. As long as there are less than a few millions of unique counts, this would be OK w.r.t. RAM usage (but 1st version would get very slow then anyway).

This leads to new verb in 9.04 version (Key dyad [5]). I thought using it would help to "pack" pairs of words and numbers; then do kind of "packed sort" trick. It turned out to be a few times slower, I didn't investigate and I abandoned this route.

Moving to data output, requiring user to provide "numeric width" or whatever it was called was quite silly on my part; computing decimal logarithm of maximum would be OK :) And even so, it turned out that J formats numeric column (as opposed to row i.e. 1D array) automatically padded as required.

My final thoughts would be that multithreading is new feature in 9.04 version, as I already said. Documentation says [6]:

Even if your code does not create tasks, the system can take advantage of threadpool 0 to make primitives run faster

At first I thought it means that simply spawning workers would be enough for J to run parallel. Kind of how e.g. PDL does [7] (or tries to do). It turned out to be not so, either because script is very simple, or this automatic parallelization is not yet enabled in beta. We'll see.

1. https://code.jsoftware.com/wiki/Guides/DLLs/Calling_the_J_DLL
2. https://www.jsoftware.com/help/jforc/performance_measurement__tip.htm#_Toc191734575
3. https://code.jsoftware.com/wiki/Addons/tables/dsv
4. https://code.jsoftware.com/wiki/Vocabulary/sco
5. https://code.jsoftware.com/wiki/Vocabulary/slashdot#dyadic
6. https://code.jsoftware.com/wiki/Vocabulary/tcapdot#dyadic
7. https://metacpan.org/dist/PDL/view/Basic/Pod/ParallelCPU.pod

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11149507]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2024-03-28 18:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found