One my problem, one yours.

The code was untested, and I wrote as if pack 'b*', ... & unpack 'b*', ... worked they way I would have liked them to work, rather than the way they do. (They would have been more useful!)

You need a couple of subroutines to based around vec to build the bitstrings. See below. Your mistake (masked totally by mine) was that you were reusing @bits before constructing the first bitstring.

Try this:

#!/usr/bin/perl use strict; sub toBitstring { my $bits = chr(0) x int( (($#_)/8)+1 ); vec( $bits, $_, 1 ) = $_[ $_ ] for 0 .. $#_; return $bits; } sub fromBitstring { join '', map vec( $_[0], $_, 1 ), 0 .. length( $_[ 0 ] ) * 8; } my (@patNos, @data) = 0; my $line = "44444,1,1,0,0,0,1,1,1"; my @bits = split ',',$line; $patNos[0] = shift @bits; print "$patNos[0] - @bits\n"; $data[0] = toBitstring @bits; $line = "55555,0,1,1,0,0,1,0,1"; @bits = split ',',$line; $patNos[1] = shift @bits; print "$patNos[1] - @bits\n"; $data[1] = toBitstring @bits; my $line1 = fromBitstring $data[0]; my $line2 = fromBitstring $data[1]; my $variance = unpack '%32b*', ($data[0] ^ $data[1]); print "\nline 1: $line1\n"; print "line 2: $line2\n"; print "\nvariance: $variance\n";
What is the significance of 32 here? If I'm using a 64 bit machine, should I change it to 64?

It is the length (in bits) of the accumulator used for the checksum calculated. You can use 8, 16, 32, 64. If the number of set bits in your bitstring exceeds the capacity of the accumulator, the result will be silently truncated to that number of bits (as with the %8b*' example below):

[0] Perl> $x = join '', map chr( rand 256 ), 1 .. 1000;; [0] Perl> print unpack '%8b*', $x;; 47 [0] Perl> print unpack '%16b*', $x;; 3887 [0] Perl> print unpack '%32b*', $x;; 3887 [0] Perl> print unpack '%64b*', $x;; 3887

If your strings are less than 0.5GB in length, 32-bits is sufficient. In the case of your 480 bits, 16 would suffice, but not if you move to eliminating the loop using the bigstring method I described. Using 64-bits won't hurt and may even be slightly quicker on a 64-bit machine. Very slightly though.

Why use 5.010?

Because it enables the 5.10 special features like say, given/when and defined-OR //. Unnecessary if you do not use these features.

What is the advantage of setting the length of the @patNos and @data arrays at the start?

It pre-allocates the basic internal structures of the arrays and prevents a little memory thrash as they are populated. The benefit is insignificant for this particular application, but can help for applications where the time taken to build large arrays is a significant portion of the overall runtime.

Is this just printing a status update every 1000 lines?

Yes. I used it to get a feel for how long thngs would take to run, It outputs the currrent line being processed. Due to "\r" it overwrites the line number in place on the terminal rather than scrolling up the screen.

say "\n", time;

That prints a newline, followed by the current time in seconds. Again, just a part of the simplistic timing I did.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In reply to Re^5: Huge data file and looping best practices by BrowserUk
in thread Huge data file and looping best practices by carillonator

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.