I've made the following script to generate a large set of text files. The generated files looks like real text files, they are compressible but not too much (about 50%). Should work on any Unix-like system (or windows with an additional dictionary file as a source of words). Feel free to test and adapt.
#!/usr/bin/perl use strict; use warnings; use Carp; sub loaddict { my $dict = shift; open my $fh, $dict or croak "can't open $dict: $!"; my @words = <$fh>; chomp @words; return \@words; } ####################### # main my $testdir = $ARGV[0] or die "usage : $0 <test folder> <number of files>"; my $filecount = $ARGV[1] or die "usage : $0 <test folder> <number of files>"; my $seed = 0; $seed = $ARGV[2] if defined $ARGV[2]; # force number $filecount += 0; if ( not -d "$testdir" ) { mkdir "$testdir" or die "can't mkdir $testdir"; } my $wordlist = loaddict("/usr/share/dict/words"); srand(42 + $seed ); for ( 1 .. $filecount ) { open my $file, '>', "$testdir/$_" or croak "can't open file : $!"; my $filesize = int( rand(10000) ) + 5000 ; for ( 1 .. $filesize ) { my $dice = int( rand($#{$wordlist}) ) ; print $file $wordlist->[$dice] . " "; if ( $_ % 12 == 0 ) { print $file "\n"; } } }

In reply to Re: Script to create huge sample files by wazoox
in thread Script to create huge sample files by paragkalra

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.