Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^4: Suffix-prefix matching done right

by Anonymous Monk
on Nov 14, 2021 at 03:02 UTC ( [id://11138800]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Suffix-prefix matching done right
in thread Suffix-prefix matching done right

Dear friends of the monastery,

Upon searching, there lives a MCE wc.pl example. Please note that timings for wc.pl include the time to load Perl itself plus modules and spawn/reap workers.

Preparation:

$ curl -O http://www.textfiles.com/etext/FICTION/lesms10.txt $ curl -O https://raw.githubusercontent.com/marioroy/mce-examples/mast +er/other/wc.pl $ chmod 755 wc.pl $ rm -f lesms10_big.txt $ for i in `seq 1 200`; do cat lesms10.txt >> lesms10_big.txt; done $ ls -lh lesms10* -rw-r--r-- 1 pmuser staff 3.1M Nov 13 19:27 lesms10.txt -rw-r--r-- 1 pmuser staff 623M Nov 13 19:39 lesms10_big.txt # Read once before testing to be fair for 1st and subsequent wc/wc.pl. # Sorry, I didn't want to go through the trouble clearing OS cache. $ cat lesms10_big.txt >/dev/null

Results: wc

$ time wc lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m2.132s user 0m2.076s sys 0m0.056s $ time wc < lesms10_big.txt 14152200 113528600 652751200 real 0m2.140s user 0m2.085s sys 0m0.055s

Results: wc.pl

$ time ./wc.pl lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.536s (3.98x) user 0m3.570s sys 0m0.591s $ time ./wc.pl < lesms10_big.txt 14152200 113528600 652751200 real 0m0.740s (2.89x) user 0m3.784s sys 0m1.166s

Results: line count

$ time wc -l lesms10_big.txt 14152200 lesms10_big.txt real 0m0.351s user 0m0.296s sys 0m0.055s $ time ./wc.pl -l lesms10_big.txt 14152200 lesms10_big.txt real 0m0.208s user 0m0.678s sys 0m0.835s

Results: word count

$ time wc -w lesms10_big.txt 113528600 lesms10_big.txt real 0m2.125s user 0m2.071s sys 0m0.054s $ time ./wc.pl -w lesms10_big.txt 113528600 lesms10_big.txt real 0m0.535s user 0m3.574s sys 0m0.599s

Results: byte count

$ time wc -c lesms10_big.txt 652751200 lesms10_big.txt real 0m0.003s user 0m0.001s sys 0m0.002s $ time ./wc.pl -c lesms10_big.txt 652751200 lesms10_big.txt real 0m0.054s user 0m0.042s sys 0m0.010s

Clearing OS cache:

The beliefs of yesterday may no longer be true today. These days, running on one core may be the bottleneck versus IO. Especially for hardware IO reaching gigabytes and beyond. Long ago, I removed old beliefs no longer matching with reality. This mind shift energized me to try, only to be surprised. So kudos to the OP for trying.

Linux: sudo echo 1 >/proc/sys/vm/drop_caches macOS: sudo purge $ time wc lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m2.435s user 0m2.330s sys 0m0.098s Linux: sudo echo 1 >/proc/sys/vm/drop_caches macOS: sudo purge $ time ./wc.pl lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.662s (3.68x) user 0m3.598s sys 0m0.736s

Bless to you all,

Replies are listed 'Best First'.
Re^5: Suffix-prefix matching done right
by Anonymous Monk on Nov 14, 2021 at 04:33 UTC

    An attempt on Linux failed clearing the file cache. Thus an update and results from a Linux box.

    This works: sudo bash -c "echo 1 >/proc/sys/vm/drop_caches"

    Results:

    $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time wc lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m3.412s user 0m3.307s sys 0m0.097s $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.613s (5.57x) user 0m4.168s sys 0m0.432s

    Beyond 8 cores:

    The --max-workers option defaults to auto (8 workers max). Running with --max-workers=N beyond 8 may provide extra performance on a box with more than 8 "real" cores.

    $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl --max-workers=8 lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.613s (5.57x) user 0m4.168s sys 0m0.432s $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl --max-workers=12 lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.454s (7.52x) user 0m4.464s sys 0m0.536s $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl --max-workers=16 lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m0.360s (9.48x) user 0m4.344s sys 0m0.625s

    Compare wc with wc.pl 1 and 2 cores:

    To complete the demonstration, I also tried wc.pl with --maxworkers=1 and --maxworkers=2. The timings for wc.pl include loading Perl plus modules and spawning/reaping workers.

    $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time wc lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m3.412s user 0m3.307s sys 0m0.097s $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl --max-workers=1 lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m3.553s user 0m3.363s sys 0m0.329s $ sudo bash -c "echo 1 >/proc/sys/vm/drop_caches" $ time ./wc.pl --max-workers=2 lesms10_big.txt 14152200 113528600 652751200 lesms10_big.txt real 0m1.804s user 0m3.339s sys 0m0.357s

    See also, egrep.pl from mce-examples.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11138800]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2024-04-25 20:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found