in reply to Re^4: Bit handling in Perl
in thread Bit handling in Perl

That's because the packing process has caused the IV (integer variable) that held the 0 value to be converted to a PV (string variable) which carries extra, behind-the-scenes overhead:

[0] Perl> $a = 0; print size $a;; 24 [0] Perl> $b = pack 'b*', 0; print size $b;; 56 [0] Perl> print Dump $a;; SV = IV(0x3e74270) at 0x3e74278 REFCNT = 1 FLAGS = (IOK,pIOK) IV = 0 [0] Perl> print Dump $b;; SV = PV(0x11c110) at 0x3e6c6d8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3ecd828 "\0"\0 CUR = 1 LEN = 8

As you can see, the PV (string) has a couple of extra internal control fields (CUR & LEN) plus a pointer that points to the actual string which itself consists of one 0 byte to hold the (single) bit -- you cannot pack less that 8 bits; its teh way computers work! -- and a second 0 byte which is to ensure that the string is "null terminated" -- as all strings must be for many C language library routines to work; internally Perl uses the C runtime library.

But, you are being deceived by the simplicity of your test. Let's try something a little more representative of your application:

[0] Perl> $x = join '', map{ rand() < 0.5 ? 0 : 1 } 1 .. 880;; [0] Perl> print size $x;; 936 [0] Perl> print Dump $x;; SV = PV(0x3e964e0) at 0x337708 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3e8d8c8 "0011011011100111000000101011100011111101100011111101 +101111001100000111111011010111101100110101101001011000010100111001110 +110000000100010001111011100010111111111010101001010011110111100110001 +111100100110000101100001001011100110100011000011011001100101110111100 +000010100011110010101111110101011111100100100000111110110001101111110 +111000100000111010101000010001000001110100111110010100100101011011100 +1010110000110001001011101010010000010001011011 CUR = 880 LEN = 888 [0] Perl> $y = pack 'b*', $x;; [0] Perl> print size $y;; 160 [0] Perl> print Dump $y;; SV = PV(0x3e967b0) at 0x3e6c3f0 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x3dff038 "l\347@\35\277\361\3333\370\2557ki(\347\6D\274\243\37 +7*\345=\343\223\241!\235\305\260\231\356\201\342\251_\375$\370\306~Gp +\25\"\270|Jj\247\206\221\256\4\321\16\262?\334\355\22\304S\370nh\325% +\232K\376\235\34\35L\377\330\4a6\314_\314\222\336\373\375\371m[\246\2 +46g\326e\37\36\23j5\2\346\324O\v|\272s\236\3"\0 CUR = 110 LEN = 112

And there you have it. The fixed internal overhead hasn't changed, but the length of the string (CUR) has reduced from 880 to 110; and teh actual memory use has reduced from 936 bytes to 160 bytes.

And if that all still doesn't make any sense to you; download and spend a week reading this and then come back with any remaining questions.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^6: Bit handling in Perl (illguts)
by Anonymous Monk on Oct 11, 2014 at 09:53 UTC
Re^6: Bit handling in Perl
by vaidhy_m (Novice) on Oct 11, 2014 at 10:30 UTC

    Although I do not understand all of it, I can still see that pack does reduce the size as required in my application. Thank you BrowserUk, for the reference and clarification! :)

Re^6: Bit handling in Perl
by vaidhy_m (Novice) on Oct 25, 2014 at 11:37 UTC

    Dear Monks, I used pack to write a binary file from the csv file and then used vec to compare the bit strings. Interestingly few lines were not getting written as it is in bit file. Some of the 1s were getting written as 0s in those lines. I am not sure why this happens!Also in the binary file the single bit string is getting stored in two lines while the other strings are written in only one line. Can any one please clarify? Thanks!

    This is one such line that gave an issue:

    111000000111101100110001100000000100010000000000000000000000 000000000000000000000000000000000000000000000000000000000000 000000000000000000000001000011000000000000000000000000000010 001111000100100000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000011111000001100001 000001000000000000000000000000001100000011101100010111011011 001011001011111111110000110111101100100110010000000000101010 100000000010101101100111011101100100010100001100001010000000 001010011011010100000111101000000000100111011001101000010010 100001100110110110001000100000101000111110101100000110011111 100100011000010000100001100010000110100010010000000000101100 100011001001111001110001110010001000110000001000111010000100 000000000000000001000000000000000000000000000000000000000000 10000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000

      Interestingly few lines were not getting written as it is in bit file. Some of the 1s were getting written as 0s in those lines. I am not sure why this happens!

      That sounds very strange! My best guess is that either your workstation or server is positioned too close to your local turboencabulator, and thus the memory is being affected by stray barescent skor motion produced by the modial interaction of magneto-reluctance and capacitive diractance.

      On the other hand, it could be something wrong with your code, which would be much easier to debug, if I could see it?


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        BrowserUk:

        You should've perhaps included the link showing the video discussion of the magneto-reluctance/capacitive diractance effect: Turboencabulator.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        I might consider moving my Turboencabulator to the kitchen, thanks for the suggestion! On the other hand, this is the script I used to convert the csv file to binary, which works flawlessly for most rows in my csv file.

        open(A,"6k6.csv"); open(B,">6k6.fp"); binmode B; while(<A>) { if($_=~m/(".*"),(.*)/gi) { $temp=$2; $temp=~tr[,\n][]d; print B "$1".pack 'b*',$temp; print B "\n"; } }

        Then I used vec to write out each bit from the bit string on a separate file which showed me some lines are not converted as it is into binary.