in reply to Re^6: pack/unpack binary editing
in thread pack/unpack binary editing

Here is the code I'm using and for a 1 gig file this takes a few hours.

Pardon me for saying so, but I am not surprised.

Not only are you unpacking the data to asciized binary, you then go on to split that into an array of digits.

Then use a loop to go through that array one byte at a time looking for pairs of bytes that match '00' so that you can replace them with '11', or '11' and replace those with '00'.

All this could be done with a couple of regex.

Replacing this:

while (<IN>) { my $bits = unpack("b*", $_); @array = split(//, $bits); foreach $value (@array) { $y++; $tmp = $tmp . $value; if ($y == 2) { if ($tmp =~ /00/) { $tmp = '11'; } elsif ($tmp =~ /11/) { $tmp = '00'; } print OUT $tmp; $y = 0; $tmp = ''; } } }

with this:

while (<IN>) { my $bits = unpack("b*", $_); $bits =~ s[(00|11)][ $1 eq '00' ? '11' : '00']ge; print OUT $bits; }

should do (untested) the same thing and will run very much more quickly.

Do I understand the logic of this code correctly?

$/ = \7680; my $x = 0; #I LEFT THE OUTPUT AS ASCII-IZED BINARY AT THIS POINT #THUS THE LARGE +INCREASE IN FILE SIZE open IN, "$tmp2"; open OUT, >$tmp3"; while (<IN>) { $_ =~ s/^.*(11111011000010001111011100010000)/$1/ if $x == 0; $x = 1; print OUT pack("b*", $_); } close IN; close OUT;

You are checking the first record only for the first occurance of the sync pattern, and then discarding anything that preceeds it?

Ie. If the first record contains a partial frame, then throw it away and so sync the rest of the file?

If so, then the following code should be a complete replacement and run in a fraction of the time. The output file "tmp2" will be the final file you are after without creating the 9 GB intermediate.

Let me know if it works please. Also how long it takes. There are other thing that could be code to speed this up I think, but if the new runtime is acceptable, they may not be worth the extra effort.

die "USAGE: $0 input\n" if scalar(@ARGV) < 1; $| = 1; $/ = \960; open IN, "$ARGV0" or die "$ARGV0 : $!"; open OUT, ">tmp2"; my $y = 0; my $value = ''; while (<IN>) { my $bits = unpack("b*", $_); ## Replace '00' with '11' and vice versa $bits =~ s[(00|11)][ $1 eq '00' ? '11' : '00']ge; ## Discard any partial fraem from the front of the file. $bits =~ s/^(.*)(?=11111011000010001111011100010000)// if $. == 1; print OUT pack 'b*', $bits; } close IN; close OUT;

Examine what is said, not who speaks.
Silence betokens consent.
Love the truth but pardon error.

Replies are listed 'Best First'.
Re^8: pack/unpack binary editing
by tperdue (Sexton) on Feb 10, 2005 at 14:17 UTC
    I will give this a shot. What I really need to do with the sync pattern is find every occurance and extract it along with the following 476 bytes discarding everything after the 476th byte up to the next sync.

      In that case, the code in Re^5: pack/unpack binary editing should be easily adaptable to your purpose. You'll need to understand how it works, but the code from the previous post can be combined with it to do everything, including the syncing in a single pass.


      Examine what is said, not who speaks.
      Silence betokens consent.
      Love the truth but pardon error.
        Thanks for all your help. Regarding the code you posted. You'll have to fogive my ignorance. Could you break down your substitution line. I don't really understand what it's doing. I ran the code. Not sure how much faster it is 'cause I stopped it. My results were wrong. The substitution isn't quit right. I need to be able to look at every 2 bits and then determine if they are a 11 or a 00 then replace and move on to the next 2 bits.