matrix_killer has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, I am trying to write a tool that will download the latest exploits and there descriptions from milw0rm.com and write them into two .txt files.I have writed the tools but I have a problem when it comes to comparing the titles in the array if they already exist with the file which contains the description.The problem is that doesn't check 2 or 3 strings and if it finds matches it will write values that already exist in the file but not the rest non-existing code.Here is the code:
#!perl use IO::Socket; use HTML::Entities; $|=1; @milw0rm = ( ); @milw0rm_title = ( ); $id1=0; $id2=0; $xpldir = "/opt/lampp/htdocs/parser/test"; while (1) { $sock = IO::Socket::INET->new(Proto => "tcp", PeerAddr => "milw0rm.com +", PeerPort => "80") || die("Could not connect to milw0rm.org"); print $sock "GET /rss.php HTTP/1.0\nHost: milw0rm.com\nConnection: clo +se\n\n"; while ($foo = <$sock>) { if ($foo =~ /(<guid>http:\/\/www.milw0rm.com\/exploits\/(\S+)<\/guid>< +\/item>)/gi) { $xpl = $1; $xpl =~ s/<guid>http:\/\/www.milw0rm.com\/exploits\///eg; $xpl =~ s/<\/guid><\/item>//eg; print "$xpl\n"; push(@milw0rm, "$xpl"); } if ($foo =~ /(<title>(.*?)<\/title>)/gi) { $title = $1; next if $title =~ /<title>milw0rm.com<\/title>/gi; $title =~ s/<title>//eg; $title =~ s/<\/title>//eg; $title = decode_entities( $title ); print "$title\n"; push(@milw0rm_title, "$title"); }} open (CH, "</opt/lampp/htdocs/parser/test/exploit_description.txt"); @check = <CH>; close CH; @unique_milw0rm = (); for (my $i;$i<=$#milw0rm_title;$i++) { $seen = 0; for (my $j;$j<=$#check;$j++) { if ($milw0rm_title[$i] =~ /$check[$j]/) { $seen = 1; next; }}; if ($seen == 0) { push @unique_milw0rm, $milw0rm_title[$i]}; }; foreach $title1 (@milw0rm_title) { chomp $title1; $id1++; open (LOG, ">>/opt/lampp/htdocs/parser/test/exploit_description.txt"); print LOG "$id1|$title1\n"; close LOG; } foreach $xpl1 (@milw0rm) { chomp $xpl1; if (!-e "/opt/lampp/htdocs/parser/test/$xpl1") { if (!-e "/opt/lampp/htdocs/parser/test/$xpl1.rar") { $id2++; open (LOG, ">>/opt/lampp/htdocs/parser/test/list.txt"); print LOG "$id2|$xpl1\n"; close LOG; system("wget --no-clobber --directory-prefix=$xpldir http://milw0rm.co +m/exploits/download/$xpl1"); system("chmod 0644 $xpldir/$xpl1"); #system("cd $xpldir/; rar a -m5 -ed -o- -sve -tl -tsa -rr -vp -dh -ri1 + -k -inul $xpldir/$xpl1.rar $xpldir/$xpl1 ; rm -rf $xpl1"); }}} print "DONE FOR NOW\n"; sleep(20); #sleep(10800); }

Replies are listed 'Best First'.
Re: Help me please for milw0rm.com downloader
by Corion (Patriarch) on Apr 12, 2008 at 12:41 UTC
    I have a problem when it comes to comparing the titles in the array if they already exist with the file which contains the description.

    Whenever you think "already exist", think "hash". Replace your array with a hash, and look at exists.

    I didn't look closer at your code, because it's really long and I'm not interested in wading through long code, where most of the lines have no bearing to the problem. You haven't shown any effort to reduce the problem to the minimal lines necessary. Most likely, the minimal problem would look like:

    use strict; #open (CH, "</opt/lampp/htdocs/parser/test/exploit_description.txt"); my @check = <DATA>; #close CH; my @unique_milw0rm = (); my @milw0rm_title; for (my $i;$i<=$#milw0rm_title;$i++) { $seen = 0; for (my $j;$j<=$#check;$j++) { if ($milw0rm_title[$i] =~ /$check[$j]/) { $seen = 1; next; } }; if ($seen == 0) { push @unique_milw0rm, $milw0rm_title[$i]}; }; __DATA__ some milw0rm data more milw0rm data

    Then, I would have spotted the two problems in the code right away, and likely you would have too. First, use a hash for the titles to check for the existence. Second, you are reading in the titles with newlines at the end, but likely you're comparing them to other data that doesn't have newlines at the end. Check that by printing the two items:

    print "First entry of check array: >$check[0]<\n";

    Then, you want to read on chomp.