Foreach/While - Why does this run AFTER that?

CalebH has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks!
Recently I was asked to write a script for a friend to hash check a torrent file against data on their drive.

I found a script online that did similar, and edited it around a bit. (I added the portion to split the file on disk and hash check it.)

#!/usr/bin/perl
$|++;
use Bencode qw/bdecode/;
use Digest::SHA1 qw(sha1_hex);
my $base = shift @ARGV;
my $torrent = "$base.torrent";
 
open( T, $torrent) or die $!;
my $torrent_data = join '', <T>;
close( T );
 
my $metainfo = bdecode( $torrent_data );
 
my $file_name = "$base/" . $metainfo->{'info'}->{'name'};
my $file_length = $metainfo->{'info'}->{'length'};
my $piece_length = $metainfo->{'info'}->{'piece length'};
 
my $pieces = $metainfo->{'info'}->{'pieces'};
my @pieces = ();
my $offset = 0;
while ( $offset < length( $pieces ) ) {
my $piece = substr( $pieces, $offset * 20, 20 );
 push @pieces, $piece;
 $pieces = substr($pieces,20);
}
my $path = "C:/users/caleb/desktop/";
open( F, "<", $path . $file_name ) or die "Cannot open file - (Tried t
+o open $path$file_name and got this error : $!\n";
print "MyFile - $path$file_name\n";
    my $counted = 0;
    foreach my $p ( @pieces ) {
    $counted++;
$p =~ s/(.)/sprintf("%02x",ord($1))/egs;

print "[dothis:] Piece $counted: $p \n";


my ($buf, $data, $n);
binmode F;
$excount;
while ($n = read (F, $data, $piece_length)  != 0) { 

$excount++;
print "[dothat] Hash2:" .  sha1_hex($data) . "\n\n";

$countpieces = $#pieces;
print "Piece Count: $countpieces\n";
print "N = $n\n";
print "DataCount:" . length($data) . "\n";

print "Count:" . $excount . "\n";
print "Counted: $counted\n";
print "Hash:" .  sha1_hex($p) . "\n\n";
print "HashMatch!\n" if ($p eq sha1_hex($data));
print "No Match!\n" if ($p ne sha1_hex($data));
print "----------------------\n";
$buf .= $data;
}

}

close( F );
[download]

Now, my intended goal was for the script to compare the pieces from the .torrent file against the file stored on disk, one piece at a time, and if the hashes matched let the user know.

The problem is that the script will only output the following -

MyFile - C:/users/caleb/desktop/test/testfile.mp3
[dothis:] Piece 1: c0ba18a0db37c7d6c8c4c267355627dd5b98473f 
[dothat] Hash2:c0ba18a0db37c7d6c8c4c267355627dd5b98473f

Piece Count: 1192
N = 1
DataCount:32768
Count:1
Counted: 1
Hash:ab50b58f162501072fe1ad6c2120e92eba500afd

HashMatch!
----------------------
[dothat] Hash2:d902928543f3e0aa4092534c075957e800a5e3c6

Piece Count: 1192
N = 1
DataCount:32768
Count:2
Counted: 1
Hash:ab50b58f162501072fe1ad6c2120e92eba500afd

No Match!
----------------------
....(Edited to not include 1192 data entries)..
....
....
[dothat] Hash2:5402f35d944e62bfc750e5a8b5f3be6e84f6cb3b

Piece Count: 1192
N = 1
DataCount:31663
Count:1193
Counted: 1
Hash:ab50b58f162501072fe1ad6c2120e92eba500afd

No Match!
----------------------
[dothis:] Piece 2: d902928543f3e0aa4092534c075957e800a5e3c6 
[dothis:] Piece 3: 1bf32c88a4d6ba89223dfbc680a310c805e54e66
[download]

Now, the hashes all match up, but unfortunately the script is not running as it should. 'dothis' should be executing along with 'dothat', but doesn't run until after 'dothat' finishes.

I realize that the while loop is ran once the first piece is counted ($p), and then the script continues the while loop until it finishes and then goes back up. (This is obvious by the logfile - 'Hash:', and 'Counted' never change, where Hash: should be matching with the Hash2: in the logfile, and printing HashMatch, and the Counted: section should be matching Count:).

What I don't understand is WHY, since it's in a foreach loop first and $p should increment with every iteration of the while loop, correct? I also don't understand how to fix this, and I have tried several different things, including moving around the } towards the end to see if possibly I had put them in the wrong spots, as well as adding dothis and dothat in subroutines at one point, and calling them from each other. Aside from changing the order that these two run in, I still get the same results with every change I try to make. During the first round of edits, I had a problem where every single run would increase the size of $p, to the point where it's length went over 250 characters, and grew with every iteration.

I appreciate all the help, and after a week of trying to figure out what I am doing wrong (and going through about 12 different Perl books that I own), I am stumped.

Comment on Foreach/While - Why does this run AFTER that? Select or Download Code

Replies are listed 'Best First'.

Re: Foreach/While - Why does this run AFTER that?
by jonadab (Parson) on Oct 06, 2014 at 02:02 UTC

What I don't understand is WHY, since it's in a foreach loop first

The way I read your code, the (second) while loop is inside of the foreach loop. Snipping out the rest, the basic loop structure of your code looks like this:

while ( $offset < length( $pieces ) ) {
  # do some stuff
}
# do some more stuff
foreach my $p ( @pieces ) {
  # do some more stuff
  print "[dothis:] Piece $counted: $p \n";
  # do some more stuff
  while ($n = read (F, $data, $piece_length)  != 0) { 
     # yet more stuff
     print "[dothat] Hash2:" .  sha1_hex($data) . "\n\n";
     # yet more stuff
  }
}
[download]

So for each value of $p in the foreach loop, the inner while loop is going to run until it can't read any more data from F. On the first time through the foreach loop, this will read all the data from F that's available. On subsequent times through the foreach loop, all the data from F have already been read.

[reply]
[d/l]

Re^2: Foreach/While - Why does this run AFTER that?

by Athanasius (Archbishop) on Oct 06, 2014 at 07:22 UTC

... in which case the immediate solution to the problem is to:

seek F, 0, 0;
[download]

immediately before the inner while loop (see seek).

To the OP:

But reading the same input file multiple times is a poor design. Reading from disk is order(s) of magnitude more expensive than reading from RAM. Much better to read the file once, storing its data in a suitable data structure, and then iterate over that data structure as needed.

BTW, some consistent indentation would go a lo-o-ong way towards making the code more readable (and the problem more tractable).

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re^3: Foreach/While - Why does this run AFTER that?

by CalebH (Acolyte) on Oct 06, 2014 at 12:25 UTC

Ahh, thanks.

I agree that it is poor design, but at the time it is all I could think of. For the example I posted, I was testing it on a 38mb .mp3 file, in which case the CPU use for perl.exe goes to around 42mb while it's in use.

CPU usage is a big thing to me (I tend to use Firefox or Chrome, and I'm limited to around 2GB of usable memory at any given time. Add to that the fact that Firefox will take up around 300mb, and the system takes up a considerable amount, and I'm not left with a lot), so I am open to suggestions on how to make the coding for efficient.

Per your suggestion, and please correct me if I am wrong, you are saying that it would take less memory to add the file to an array, and then split it from there? I figured that adding it in at once would still spike the memory usage to whatever size the file was. While that is not a bad thing if the file is >100mb, when you have a 1GB file for example, it's not something that I would want slowing my computer down to a crawl. I probably misunderstood, and am just overlooking a way to do this effectively. If you could point me to any write-ups, or tutorials on ways to do this without causing a giant memory leak, then I would be super grateful :)

seek F,0,0;
while ($n = read (F, $data, $piece_length)  != 0) { 
$excount++;
$currentpiece = shift(@pieces);
    $counted++;
$currentpiece =~ s/(.)/sprintf("%02x",ord($1))/egs;
[download]

This seems to work out well enough, since I only wanted one piece at a time for a .torrent file, and then it's not needed anymore. And, added bonus, all the hashes match up! I wanted to get this section working before I moved on to a finder section (Using File::Find, I want to be able to go through and find a torrents data directory in a list of subdirectories, then hash check all of the data. If it is complete, I will move the data directories to a specified area for better organization.) https://github.com/thoj/torrentmv-perl/blob/master/torrentmv.pl does a similar job or verifying data, but it doesn't go through multiple directories, has to have a path specified to it on the command line (which could change), and would not run on my Windows7 64bit system without some modifying which caused me to whip this script up.

[reply]
[d/l]

Re^4: Foreach/While - Why does this run AFTER that?

by Athanasius (Archbishop) on Oct 06, 2014 at 12:51 UTC

Re: Foreach/While - Why does this run AFTER that?
by james28909 (Deacon) on Oct 06, 2014 at 03:50 UTC

open $hddfile, '<whateverfile'; 
binmode it;

my $length = -s $hddfile;
my $chunks = $length/ #divided by how ever many chunks you want
my $num = #make this to the amount you divided the file into
my $counter = 0;

my $name = whatever;

foreach (1 .. $num){
    open $temp, '>', "$name$counter";  #should split up file based on 
+whatever size you want
    read $hddfile, my $buf, $chunks;   #someone correct me it im wrong
+ haha
    print $temp, $buf;
$counter++;
}

open $cmpfile, '<file to compare it with';
binmode it;

my $length = -s $cmpfile;
my $chunks = $length/ #divided by how ever many chunks you want

my $num = #match to the division number
my $counter = 0;
my $name = whatever;

foreach (1 .. $num){
    open $temp, '>', "$name$counter";  #should split up file based on 
+whatever size you want
    read $hddfile, my $buf, $chunks;   #someone correct me it im wrong
+ haha
    print $temp, $buf;
$counter++
}
[download]

[reply]
[d/l]

Re^2: Foreach/While - Why does this run AFTER that?

by AnomalousMonk (Archbishop) on Oct 06, 2014 at 13:35 UTC

Just a few thoughts on your pseudo-code:

my $chunks = $length/ #divided by how ever many chunks you want

This would seem to give you a chunk length rather than a number of chunks, but you are subsequently iterating over $chunks as if it were a number of chunks.

foreach (0 .. $chunks){
...
}

This gives an off-by-one error if $chunks is the number of chunks over which you want to iterate. Either
foreach (0 .. $chunks - 1) { ... }
or
foreach (1 .. $chunks) { ... }
would seem to do a better job (if knowing the chunk index is not an issue).

[reply]
[d/l]
[select]

Re^3: Foreach/While - Why does this run AFTER that?

by james28909 (Deacon) on Oct 06, 2014 at 23:07 UTC

[reply]

Re^2: Foreach/While - Why does this run AFTER that?

by CalebH (Acolyte) on Oct 06, 2014 at 12:42 UTC

This approach would work, but it seems like it does double the work. For example, this would create several small files on a drive, which would then have to be deleted.

Let's say as an example that this was run on 20+ .torrent files, all equaling 500mb. In the end, without a deletion, I would have around 9GB stored on the drive. Also, even if the data was deleted as the script ran, it would be taking up 9GB of space in the Recycle Bin (at least I think; I'm not sure if perl deletes to the recycle bin or just removes it entirely.) FWIW, I'm running perl on Win7 64bit.

I do like your idea, though :)

[reply]


We don't bite newbies here... much
	PerlMonks