Vonunov has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm not too experienced with Perl and am never motivated to learn until I actually need something done, so sorry this isn't all eloquent and concise. :P

I have a list of URLs in a text file. I often add incomplete URLs or directory paths to the list in haste (for instance, missing http:// or the entire domain name itself), so I wrote up a perl script to add this automatically. I also want it all kept in lowercase to make it easier to grep, and I tend to copy lists of URLs that are not all lowercase. I originally used this:

#!/usr/bin/perl #This is c.pl open (file, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file>) { $lines =~ tr/A-Z/a-z/; print NEW $lines; } close (file); exec("rm petpages ; mv petpages.new petpages ; perl c2.pl");

Which referred to this file:

#!/usr/bin/perl #This is c2.pl open (file, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file>) { print NEW "http://" . $lines; } close (file); exec("rm petpages ; mv petpages.new petpages");

I thought just today to put it all into one file this way:

#!/usr/bin/perl open (file, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file>) { $lines =~ tr/A-Z/a-z/; print NEW $lines; } close (file); exec("rm petpages ; mv petpages.new petpages"); open (file, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file>) { print NEW "http://" . $lines; } close (file); exec("rm petpages ; mv petpages.new petpages");

This completed case conversion but failed to prepend http:// to each line as the pair of files did successfully.

With warning flag on, it threw me this:

Unquoted string "file" may clash with future reserved word at c.pl lin +e 3. Unquoted string "file" may clash with future reserved word at c.pl lin +e 9. Unquoted string "file" may clash with future reserved word at c.pl lin +e 11. Unquoted string "file" may clash with future reserved word at c.pl lin +e 16. Statement unlikely to be reached at c.pl line 11. (Maybe you meant system() when you said exec()?)

I followed the meaning of this somewhat, and changed the handles (note file vs. file2):

#!/usr/bin/perl open (file, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file>) { $lines =~ tr/A-Z/a-z/; print NEW $lines; } close (file); exec("rm petpages ; mv petpages.new petpages"); open (file2, "petpages"); open NEW, ">", "petpages.new" or die $!; while ($lines = <file2>) { print NEW "http://" . $lines; } close (file2); exec("rm petpages ; mv petpages.new petpages");

However, this also fails to prepend http://, and returns:

Unquoted string "file" may clash with future reserved word at c.pl lin +e 3. Unquoted string "file" may clash with future reserved word at c.pl lin +e 9. Statement unlikely to be reached at c.pl line 11. (Maybe you meant system() when you said exec()?)

I'm pretty much lost at this point. Any suggestions? Also, is exec() outdated?

Thanks,

Jack

Replies are listed 'Best First'.
Re: File manipulation only works when I split it into two files.
by lostjimmy (Chaplain) on Dec 03, 2008 at 01:03 UTC

    First, you should always use strict; and use warnings;. Second, you should use lexical file handles. Third, you can perform this in one loop over the file. And fourth, you can use unlink to delete the file, and rename to rename the temp file.

    This should do what you want:

    #!/usr/bin/perl use strict; use warnings; open my $file, "<", "petpages" or die "Could not open petpages: $!"; open my $new, ">", "petpages.new" or die "Could not open petpages.new: + $!"; while (my $line = <$file>) { $line =~ tr/A-Z/a-z/; print $new "http://$line"; } close $file; close $new; rename "petpages.new", "petpages";

    Update:I forgot to mention that exec doesn't do what you think. As the first line in the doc says, exec never returns. That is why you got the warning: Statement unlikely to be reached at c.pl line 11. (Maybe you meant system() when you said exec()?). Since the rest of the code is never executed, that is why the "http://" never gets prepended.

      Ah, nice, that does it. I vaguely remember trying to do it in one loop over but I don't remember how I did that. It didn't work, anyway.

      Thank you.

        Doing it in one loop:

        #!/usr/bin/perl -w use strict; die "Usage: ", $0 =~ /([^\/]+)$/, " <filename>\n" unless @ARGV; $^I = ''; # We're really brave - no backup file! + while (<>){ next unless /./; # Skip empty lines tr/A-Z/a-z/; # Lowercase the lot s#^(?!http://)#http://#; # Prefix 'http://' if one is missing print; }

        If 'petpages' contains the following:

        foobar.com http://foo.int fOoBaR.nEt FoObAr.OrG

        then running the above script with 'petpages' as an argument results in the content being changed to this:

        http://foobar.com http://foo.int http://foobar.net http://foobar.org

        Update: Clarified phrasing a bit.


        --
        "Language shapes the way we think, and determines what we can think about."
        -- B. L. Whorf
Re: File manipulation only works when I split it into two files.
by jethro (Monsignor) on Dec 03, 2008 at 01:16 UTC

    Your script didn't work because exec doesn't return anymore after it executes the rm and mv commands. Note that the error message gives a strong hint when it tells you that you probably wanted to call system() instead of exec.

    This is something to remember, especially if you don't have the motivation to learn beforehand: Always inspect error messages, they are here to tell you something and often (as in this case) they give you a hint what you have to look up to understand the problem

    Update: You also might want to add http:// only when it is necessary:

    $line= 'http://' . $line if (not $line=~/^http:/); print $new $line;