Grendel2112 has asked for the wisdom of the Perl Monks concerning the following question:

I am a beginner to PERL so please be gentle with me. :) Here is my prob: I have a CSV file with around 80 columns of word lists. I need to randomize the words in each column (and have them stay in their original column after randomizing) and output to a new CSV file. I managed to write a randomize snippet that works on a file with one column of words but processing CSV with multiple columns is beyond me at this time. Please any help and hand holding would be greatly appreciated.

Replies are listed 'Best First'.
Re: Randomize CSV word lists
by Juerd (Abbot) on Apr 04, 2002 at 18:14 UTC
      Good list, Juerd.

      But you forgot "Make sure you have -w and use strict" :)
      --
      Mike

        But you forgot "Make sure you have -w and use strict" :)

        I assume every beginner already has seen that thousands of times. Besides, I'm convinced one has to experience strictless hell before enjoying strict - that's how I did it, and it made me love strict even more.

        U28geW91IGNhbiBhbGwgcm90MTMgY
        W5kIHBhY2soKS4gQnV0IGRvIHlvdS
        ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
        geW91IHNlZSBpdD8gIC0tIEp1ZXJk
        

      I saw the "How do I shuffle an array randomly?" but my concern with that is that it might randomize the words out of their original columns and not knowing enough about this I can't determine if that is the case.
Re: Randomize CSV word lists
by traveler (Parson) on Apr 04, 2002 at 19:08 UTC
    Juerd's list is good, but there is no need to copy and paste the shuffle algorithm. You can use Algorithm::Numerical::Shuffle. Despite the name, it does shuffle lists of strings. This module has been of great use to me.

    HTH, --traveler

Re: Randomize CSV word lists
by graff (Chancellor) on Apr 05, 2002 at 03:29 UTC
    Previous comments were all informative, but I gather you may still be wondering how to deal with 80 columns or so of data... You want to transpose the CSV array, so that each column is stored in its own array so you can shuffle it.

    I tried the following on (a copy of) a csv dump of my last bank statement -- seems to do the job (could make taxes interesting this year...)

    BTW, I suppose Fisher_Yates is good enough, but my own favorite has always been to prepend a random number to the string (default output of rand() is between 0.0 and 0.999...), then sort, then remove the random number.

    use strict; my @transpose; # this will be an array of arrays my $ncols = 0; while (<>) { chomp; my @cols = split(/,/); if ( $ncols ) { die "Line $. doesn't have $ncols columns\n" if ( $ncols != scalar @cols ); } else { $ncols = scalar @cols; } foreach my $i (0..$#cols) { push( @{$transpose[$i]}, $cols[$i] ); } } my $nrows = $.; for (0..$ncols-1) { &fisher_yates_shuffle( $transpose[$_] ); } foreach my $i (0..$nrows-1) { my @cols = (); foreach my $j (0..$ncols-1) { push( @cols, $transpose[$j][$i] ); } print join( ",", @cols ) . "\n"; }

      BTW, I suppose Fisher_Yates is good enough, but my own favorite has always been to prepend a random number to the string (default output of rand() is between 0.0 and 0.999...), then sort, then remove the random number.

      Not only is it good enough, it's also a lot more efficient and scalable.

      The Fisher_Yates algorithm is an inline sort, swapping array elements. Your solution first alters all elements, then sorts it, assigns the result of the sort to an array, after which you remove the string. I have not benchmarked it, but it sounds like a slow procedure - which may still be very useful for small arrays.

      my @cols = split(/,/);

      That's not CSV parsing. CSV isn't just comma-seperated, the format also supports quoted strings and escaping of quotes with other quotes. See Re (tilly) 1: csv output.

      U28geW91IGNhbiBhbGwgcm90MTMgY
      W5kIHBhY2soKS4gQnV0IGRvIHlvdS
      ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
      geW91IHNlZSBpdD8gIC0tIEp1ZXJk
      

      Thank you very much. That was very useful. Oh, frabjous day, Calloo, callay. :)