in reply to Randomize CSV word lists

Previous comments were all informative, but I gather you may still be wondering how to deal with 80 columns or so of data... You want to transpose the CSV array, so that each column is stored in its own array so you can shuffle it.

I tried the following on (a copy of) a csv dump of my last bank statement -- seems to do the job (could make taxes interesting this year...)

BTW, I suppose Fisher_Yates is good enough, but my own favorite has always been to prepend a random number to the string (default output of rand() is between 0.0 and 0.999...), then sort, then remove the random number.

use strict; my @transpose; # this will be an array of arrays my $ncols = 0; while (<>) { chomp; my @cols = split(/,/); if ( $ncols ) { die "Line $. doesn't have $ncols columns\n" if ( $ncols != scalar @cols ); } else { $ncols = scalar @cols; } foreach my $i (0..$#cols) { push( @{$transpose[$i]}, $cols[$i] ); } } my $nrows = $.; for (0..$ncols-1) { &fisher_yates_shuffle( $transpose[$_] ); } foreach my $i (0..$nrows-1) { my @cols = (); foreach my $j (0..$ncols-1) { push( @cols, $transpose[$j][$i] ); } print join( ",", @cols ) . "\n"; }

Replies are listed 'Best First'.
Re: Re: Randomize CSV word lists
by Juerd (Abbot) on Apr 05, 2002 at 06:33 UTC

    BTW, I suppose Fisher_Yates is good enough, but my own favorite has always been to prepend a random number to the string (default output of rand() is between 0.0 and 0.999...), then sort, then remove the random number.

    Not only is it good enough, it's also a lot more efficient and scalable.

    The Fisher_Yates algorithm is an inline sort, swapping array elements. Your solution first alters all elements, then sorts it, assigns the result of the sort to an array, after which you remove the string. I have not benchmarked it, but it sounds like a slow procedure - which may still be very useful for small arrays.

    my @cols = split(/,/);

    That's not CSV parsing. CSV isn't just comma-seperated, the format also supports quoted strings and escaping of quotes with other quotes. See Re (tilly) 1: csv output.

    U28geW91IGNhbiBhbGwgcm90MTMgY
    W5kIHBhY2soKS4gQnV0IGRvIHlvdS
    ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
    geW91IHNlZSBpdD8gIC0tIEp1ZXJk
    

Re: Re: Randomize CSV word lists
by Grendel2112 (Initiate) on Apr 05, 2002 at 13:19 UTC
    Thank you very much. That was very useful. Oh, frabjous day, Calloo, callay. :)