in reply to Re^4: Rosetta Code: Long List is Long (outgunned?!)
in thread Rosetta Code: Long List is Long

What a delight for our Anonymonk friend to come back. Thanks to you, we tried parallel :).

... but files can be unequal sizes, or just one huge single file. I think serious solution would probe inside to find newlines at approx. addresses, then pass chunks coords to workers to parse in parallel.

Chuma mentions 2,064 input files in the initial "Long list is long" thread. Processing a list of files in parallel is suited for this use case due to many files. Back in 2014, I wrote utilities that support both chunking and list modes; mce_grep and egrep.pl via --chunk-level={auto|file|list}.

llil5p.ijs

I took llil5.ijs and created a parallel version named llil5p.ijs, based on code-bits from your prior post. The number of threads can be specified via the NUM_THREADS environment variable.

$ diff -u llil5.ijs llil5p.ijs --- llil5.ijs 2023-01-18 09:25:14.041515970 -0600 +++ llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 @@ -9,6 +9,12 @@ pattern =: 0 1 +nthrs =: 2!:5 'NUM_THREADS' NB. get_env NUM_THREADS +{{ + if. nthrs do. nthrs =: ".nthrs end. NB. string to integer conversio +n + for. i. nthrs do. 0 T. 0 end. NB. spin nthrs +}} '' + args =: 2 }. ARGV fn_out =: {: args fn_in =: }: args @@ -44,7 +50,7 @@ read_many_files =: {{ 'fnames pattern' =. y - ,&.>/"2 (-#pattern) ]\ ,(read_file @:(; &pattern)) "0 fnames + ,&.>/"2 (-#pattern) ]\ ,;(read_file @:(; &pattern)) t.'' "0 fnames }} 'words nums' =: read_many_files fn_in ; pattern

llil5tp.ijs

Next, I applied the turbo update to the parallel version and named it llil5tp.ijs.

$ diff -u llil5p.ijs llil5tp.ijs --- llil5p.ijs 2023-01-18 09:25:58.889669110 -0600 +++ llil5tp.ijs 2023-01-18 09:26:01.553736512 -0600 @@ -21,6 +21,16 @@ filter_CR =: #~ ~: & CR +turbo_mode_ON =: {{ + assert. 0 <: c =. 8 - {: $y + h =. (3 (3!:4) 16be2), ,|."1 [3 (3!:4)"0 (4:,#,1:,#) y + 3!:2 h, ,y ,"1 _ c # ' ' +}} + +turbo_mode_OFF =: {{ + (5& }. @: (_8& (]\)) @: (2& (3!:1))) &.> y +}} + read_file =: {{ 'fname pattern' =. y @@ -56,6 +66,7 @@ 'words nums' =: read_many_files fn_in ; pattern t1 =: (6!:1) '' NB. time since engine start +words =: turbo_mode_ON words idx =: i.~ words nums =: idx +//. nums @@ -65,6 +76,7 @@ nums =: ~. nums 'words nums' =: (\: nums)& { &.:>"_1 words ; nums +words =: turbo_mode_OFF words t2 =: (6!:1) '' NB. time since engine start text =: ; words (, @: (,"1 _))&.(>`a:)"_1 TAB ,. (": ,. nums) ,. LF