Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Foreach/While - Why does this run AFTER that?

by CalebH (Acolyte)
on Oct 05, 2014 at 20:04 UTC ( [id://1102886]=perlquestion: print w/replies, xml ) Need Help??

CalebH has asked for the wisdom of the Perl Monks concerning the following question:

Greetings, Monks!
Recently I was asked to write a script for a friend to hash check a torrent file against data on their drive.

I found a script online that did similar, and edited it around a bit. (I added the portion to split the file on disk and hash check it.)

#!/usr/bin/perl $|++; use Bencode qw/bdecode/; use Digest::SHA1 qw(sha1_hex); my $base = shift @ARGV; my $torrent = "$base.torrent"; open( T, $torrent) or die $!; my $torrent_data = join '', <T>; close( T ); my $metainfo = bdecode( $torrent_data ); my $file_name = "$base/" . $metainfo->{'info'}->{'name'}; my $file_length = $metainfo->{'info'}->{'length'}; my $piece_length = $metainfo->{'info'}->{'piece length'}; my $pieces = $metainfo->{'info'}->{'pieces'}; my @pieces = (); my $offset = 0; while ( $offset < length( $pieces ) ) { my $piece = substr( $pieces, $offset * 20, 20 ); push @pieces, $piece; $pieces = substr($pieces,20); } my $path = "C:/users/caleb/desktop/"; open( F, "<", $path . $file_name ) or die "Cannot open file - (Tried t +o open $path$file_name and got this error : $!\n"; print "MyFile - $path$file_name\n"; my $counted = 0; foreach my $p ( @pieces ) { $counted++; $p =~ s/(.)/sprintf("%02x",ord($1))/egs; print "[dothis:] Piece $counted: $p \n"; my ($buf, $data, $n); binmode F; $excount; while ($n = read (F, $data, $piece_length) != 0) { $excount++; print "[dothat] Hash2:" . sha1_hex($data) . "\n\n"; $countpieces = $#pieces; print "Piece Count: $countpieces\n"; print "N = $n\n"; print "DataCount:" . length($data) . "\n"; print "Count:" . $excount . "\n"; print "Counted: $counted\n"; print "Hash:" . sha1_hex($p) . "\n\n"; print "HashMatch!\n" if ($p eq sha1_hex($data)); print "No Match!\n" if ($p ne sha1_hex($data)); print "----------------------\n"; $buf .= $data; } } close( F );

Now, my intended goal was for the script to compare the pieces from the .torrent file against the file stored on disk, one piece at a time, and if the hashes matched let the user know.

The problem is that the script will only output the following -

MyFile - C:/users/caleb/desktop/test/testfile.mp3 [dothis:] Piece 1: c0ba18a0db37c7d6c8c4c267355627dd5b98473f [dothat] Hash2:c0ba18a0db37c7d6c8c4c267355627dd5b98473f Piece Count: 1192 N = 1 DataCount:32768 Count:1 Counted: 1 Hash:ab50b58f162501072fe1ad6c2120e92eba500afd HashMatch! ---------------------- [dothat] Hash2:d902928543f3e0aa4092534c075957e800a5e3c6 Piece Count: 1192 N = 1 DataCount:32768 Count:2 Counted: 1 Hash:ab50b58f162501072fe1ad6c2120e92eba500afd No Match! ---------------------- ....(Edited to not include 1192 data entries).. .... .... [dothat] Hash2:5402f35d944e62bfc750e5a8b5f3be6e84f6cb3b Piece Count: 1192 N = 1 DataCount:31663 Count:1193 Counted: 1 Hash:ab50b58f162501072fe1ad6c2120e92eba500afd No Match! ---------------------- [dothis:] Piece 2: d902928543f3e0aa4092534c075957e800a5e3c6 [dothis:] Piece 3: 1bf32c88a4d6ba89223dfbc680a310c805e54e66

Now, the hashes all match up, but unfortunately the script is not running as it should. 'dothis' should be executing along with 'dothat', but doesn't run until after 'dothat' finishes.

I realize that the while loop is ran once the first piece is counted ($p), and then the script continues the while loop until it finishes and then goes back up. (This is obvious by the logfile - 'Hash:', and 'Counted' never change, where Hash: should be matching with the Hash2: in the logfile, and printing HashMatch, and the Counted: section should be matching Count:).

What I don't understand is WHY, since it's in a foreach loop first and $p should increment with every iteration of the while loop, correct? I also don't understand how to fix this, and I have tried several different things, including moving around the } towards the end to see if possibly I had put them in the wrong spots, as well as adding dothis and dothat in subroutines at one point, and calling them from each other. Aside from changing the order that these two run in, I still get the same results with every change I try to make. During the first round of edits, I had a problem where every single run would increase the size of $p, to the point where it's length went over 250 characters, and grew with every iteration.

I appreciate all the help, and after a week of trying to figure out what I am doing wrong (and going through about 12 different Perl books that I own), I am stumped.

Replies are listed 'Best First'.
Re: Foreach/While - Why does this run AFTER that?
by jonadab (Parson) on Oct 06, 2014 at 02:02 UTC
    What I don't understand is WHY, since it's in a foreach loop first

    The way I read your code, the (second) while loop is inside of the foreach loop. Snipping out the rest, the basic loop structure of your code looks like this:

    while ( $offset < length( $pieces ) ) { # do some stuff } # do some more stuff foreach my $p ( @pieces ) { # do some more stuff print "[dothis:] Piece $counted: $p \n"; # do some more stuff while ($n = read (F, $data, $piece_length) != 0) { # yet more stuff print "[dothat] Hash2:" . sha1_hex($data) . "\n\n"; # yet more stuff } }

    So for each value of $p in the foreach loop, the inner while loop is going to run until it can't read any more data from F. On the first time through the foreach loop, this will read all the data from F that's available. On subsequent times through the foreach loop, all the data from F have already been read.

      ... in which case the immediate solution to the problem is to:

      seek F, 0, 0;

      immediately before the inner while loop (see seek).

      To the OP:

      But reading the same input file multiple times is a poor design. Reading from disk is order(s) of magnitude more expensive than reading from RAM. Much better to read the file once, storing its data in a suitable data structure, and then iterate over that data structure as needed.

      BTW, some consistent indentation would go a lo-o-ong way towards making the code more readable (and the problem more tractable).

      Hope that helps,

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        Ahh, thanks.

        I agree that it is poor design, but at the time it is all I could think of. For the example I posted, I was testing it on a 38mb .mp3 file, in which case the CPU use for perl.exe goes to around 42mb while it's in use.

        CPU usage is a big thing to me (I tend to use Firefox or Chrome, and I'm limited to around 2GB of usable memory at any given time. Add to that the fact that Firefox will take up around 300mb, and the system takes up a considerable amount, and I'm not left with a lot), so I am open to suggestions on how to make the coding for efficient.

        Per your suggestion, and please correct me if I am wrong, you are saying that it would take less memory to add the file to an array, and then split it from there? I figured that adding it in at once would still spike the memory usage to whatever size the file was. While that is not a bad thing if the file is >100mb, when you have a 1GB file for example, it's not something that I would want slowing my computer down to a crawl. I probably misunderstood, and am just overlooking a way to do this effectively. If you could point me to any write-ups, or tutorials on ways to do this without causing a giant memory leak, then I would be super grateful :)

        Rather than try to edit the foreach loop around and waste time with it, I ended up erasing it and doing the following:
        seek F,0,0; while ($n = read (F, $data, $piece_length) != 0) { $excount++; $currentpiece = shift(@pieces); $counted++; $currentpiece =~ s/(.)/sprintf("%02x",ord($1))/egs;

        This seems to work out well enough, since I only wanted one piece at a time for a .torrent file, and then it's not needed anymore. And, added bonus, all the hashes match up! I wanted to get this section working before I moved on to a finder section (Using File::Find, I want to be able to go through and find a torrents data directory in a list of subdirectories, then hash check all of the data. If it is complete, I will move the data directories to a specified area for better organization.) https://github.com/thoj/torrentmv-perl/blob/master/torrentmv.pl does a similar job or verifying data, but it doesn't go through multiple directories, has to have a path specified to it on the command line (which could change), and would not run on my Windows7 64bit system without some modifying which caused me to whip this script up.

Re: Foreach/While - Why does this run AFTER that?
by james28909 (Deacon) on Oct 06, 2014 at 03:50 UTC
    so you are wanting to split up the file in chunks and compare against one another. sounds very familiar with some stuff i do. why not just split up the file into chunks manually and check it?
    open $hddfile, '<whateverfile'; binmode it; my $length = -s $hddfile; my $chunks = $length/ #divided by how ever many chunks you want my $num = #make this to the amount you divided the file into my $counter = 0; my $name = whatever; foreach (1 .. $num){ open $temp, '>', "$name$counter"; #should split up file based on +whatever size you want read $hddfile, my $buf, $chunks; #someone correct me it im wrong + haha print $temp, $buf; $counter++; } open $cmpfile, '<file to compare it with'; binmode it; my $length = -s $cmpfile; my $chunks = $length/ #divided by how ever many chunks you want my $num = #match to the division number my $counter = 0; my $name = whatever; foreach (1 .. $num){ open $temp, '>', "$name$counter"; #should split up file based on +whatever size you want read $hddfile, my $buf, $chunks; #someone correct me it im wrong + haha print $temp, $buf; $counter++ }


    then compare them with an sha or md5 checksum tool :) tho this might not be working code, but to me it is a little easier to understand. if it were me i would split the files up into seperate folders in a shared root directory. then compare them.

      Just a few thoughts on your pseudo-code:

      my $chunks = $length/ #divided by how ever many chunks you want

      This would seem to give you a chunk length rather than a number of chunks, but you are subsequently iterating over  $chunks as if it were a number of chunks.

      foreach (0 .. $chunks){
          ...
      }

      This gives an off-by-one error if  $chunks is the number of chunks over which you want to iterate. Either
          foreach (0 .. $chunks - 1) { ... }
      or
          foreach (1 .. $chunks) { ... }
      would seem to do a better job (if knowing the chunk index is not an issue).

        yeah thought about that last night and was just able to get back abd change it :) i added a $num, match it to the amount you divided by. so if you divided the file into 4... then foreach (1 .. 4){.
        br>

      This approach would work, but it seems like it does double the work. For example, this would create several small files on a drive, which would then have to be deleted.

      Let's say as an example that this was run on 20+ .torrent files, all equaling 500mb. In the end, without a deletion, I would have around 9GB stored on the drive. Also, even if the data was deleted as the script ran, it would be taking up 9GB of space in the Recycle Bin (at least I think; I'm not sure if perl deletes to the recycle bin or just removes it entirely.) FWIW, I'm running perl on Win7 64bit.

      I do like your idea, though :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1102886]
Approved by skx
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (1)
As of 2024-04-25 03:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found