in reply to Re: Find & Delete by comparing two files
in thread Find & Delete by comparing two files

Hello, thanks for getting back to my problem. Ok, let's see - I have this pretty ugly code I managed to write and it is working. I'm sure there are nicer ways to achieve my goal but I'm a complete beginner - sorry:
#!/usr/bin/perl use LWP::Simple; $elem1 = "http://www.test.de/subfolder/"; $elem3 = "http://www.test.de/subfolder/de/"; # --------- Logfile ---------------------------- my $delimiter = "\t"; my $logfile = "errorlog.txt"; my $datum = localtime(); my $logmsg = "$datum $ENV{USER} Broken Links"; open LOGFILE, ">$logfile" or die $!; print LOGFILE $logmsg, "\n"; # --------- Open File ----------------------- open(READ,"Linkfile.csv") or die $!; while (my $line = <READ>){ if($line=~/\d+\t/){ my @liste = split($delimiter,$line); push(@urls,$liste[2]); } } close READ; # ---------------------------------------------- foreach (@urls) { chomp($_); if (head($_)) { print $_." is working.\n\n"; } else { print $_." is broken.\n\n"; open LOGFILE, ">>$logfile" or die $!; print LOGFILE "\n".$_." is broken."; close LOGFILE; $tmp_url = "$_."; @array=split(/\?/,$_); $myString = $array[0]; if ($myString =~ m#http:\/\/www.test.de\/subfolder\/de\/#) { $myString =~ s#http:\/\/www.test.de\/subfolder\/de\/##; $newUrl = $elem1.$myString; } else { $myString =~ s#http:\/\/www.test.de\/subfolder\/##; $newUrl = $elem3.$myString; print $newUrl." generated \n\n"; } if (head($newUrl)) { print $newUrl." is working.\n\n"; open LOGFILE, ">>$logfile" or die $!; print LOGFILE "\n".$newUrl." is working.\n"; close LOGFILE; } else { print $newUrl." isn't working, too.\n\n "; open LOGFILE, ">>$logfile" or die $!; print LOGFILE "\n".$newUrl." isn't working, too.\n"; close LOGFILE; } } }

Basicaly it does the following: Open the File, gettinge the 3. tabulated text and testing it if the server responds with a 200 (all ok) or 404 (not found.)

If the Link is broken, it is testing it if there's only a problem with the subfolder and generating a new url. Testing the head of this link.

So I have a file now with all the broken links - what I want to do is to delete these broken links in my original file.

PROBLEM is that I don't want to delete just the link but the whole line in the original file.

I'm sorry if I can't describe it better - English is not my mother tongue ;)

Thanks in advance. Regards, Robert

Replies are listed 'Best First'.
Re^3: Find & Delete by comparing two files
by Athanasius (Archbishop) on Sep 11, 2012 at 12:15 UTC

    Hello again perlpoda,

    Glad your code is working. Here are a few ways to improve it, in addition to the suggestions made by nemesdani:

    1. Always begin your scripts with:

      use strict; use warnings;

      strict will force you to pay attention to the scope of your variables, which is a good thing.

    2. Prefer lexical filehandles, and use the 3-argument form of open:

      open(my $log, '>', $logfile) or die "Cannot open file '$logfile' for w +riting: $!";
    3. As nemesdani noted, it is better to avoid opening and closing files any more than necessary. In this case, you open LOGFILE for writing and then don’t close it. So later, in the foreach loop, there is no need to open it again for appending: it’s still open, just write to it! Leave it open within the loop, then close it explicitly — once — after the loop.

    4. Add comments. For example, what is the code re-writing $newUrl all about? I have no idea, and chances are neither will you — when you come back to this script in, say, 6 months’s time.

    nemesdani has given you some good ideas about deleting whole lines. You’re making progress, keep going!

    Athanasius <°(((><contra mundum

      Hi Athanasius,

      thanks for your help. I will change my code, comment the lines and will try to update as you suggest. Beeing a perl novice it is sometimes not that easy to lern as fast as I want scripts to work ;)

      Thanks - Robert
Re^3: Find & Delete by comparing two files
by nemesdani (Friar) on Sep 11, 2012 at 10:00 UTC
    A few general suggestions (I haven't read your code thoroughly, sorry):
    Pack your things (e.g. If the Link is broken, it is testing it if there's only a problem with the subfolder and generating a new url. Testing the head of this link) together in subroutines, your code will be clearer, more scalable.

    Open and write to files once, don't open them every time. (time, performance)

    About the question: If you find a broken link, you could save the line numbers in an array, and after you checked each line, you can delete the lines.
    One solution that comes into my head is with Tie::File
    Example of deleting the last line from a file, stolen from the Cookbook (hellyea, I am lazy):
    use Tie::File; tie @lines, Tie::File, $file or die "can't update $file: $!"; delete $lines[-1];

    I'm too lazy to be proud of being impatient.
      Hi nemesdani, thanks for your input I will try to work that into my code. Let's see if it works out :) Thanks - Regards, Robert