in reply to Frequency Analysis Of A Subset Of A File
This will print a pretty good approximation to a randomly distributed 10% of the lines in any file, regardless of its size:
C:\test>wc -l 986831-01.dat 268 986831-01.dat C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 33 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 26 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 32 C:\test>perl -nle" rand() < 0.1 and print" 986831-01.dat | wc -l 24
Once you have randomly selected X% of the lines in the file, you only need randomly select X% of the characters (pairs/triples) in each of those lines to satisfy your overall goal.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Frequency Analysis Of A Subset Of A File
by Limbic~Region (Chancellor) on Apr 24, 2013 at 18:51 UTC | |
by BrowserUk (Patriarch) on Apr 24, 2013 at 20:45 UTC |