Re: data management script

I have to agree with the comments that chargrill made regarding the layout of your code. I have looked at your reply to his post and, although the new layout is somewhat improved, I still find it difficult to read. I don't know if problems are being introduced in the process of posting this code or whether it looks like this in reality. For instance, you have a foreach loop that is indented by two levels (four spaces each time, that's good) yet it has a closing curly brace that is indented with nine TAB characters; that's manic.

Enough about the layout, let's get to the logic. I think that you would do much better to separate the reading of the LINK file from the reading and modification of the SQL file. Read the LINK file first, constructing a hash table with the titles as the keys and the corresponding URLs as the values then close that file. After that go on to read the SQL file in a separate loop, modifying it and writing a new SQL file (both modified and unmodified lines) as you go.

TIPS

Always do

use strict;
use warnings;
[download]

at the top of your scripts to enforce some discipline and to catch typo slips.

Comment your code generously and help the comments to stand out by preceding them with a blank line.

Try using the three argument form of open to avoid problems like writing to a file that should only have been read. You correctly tested for the success of your opens but it would be useful to include the operating system error, $!, in the die message.

I would try to avoid all of the messing about transforming the title and URL from the LINK file by crafting a better regular expression to pull out just what you want. I would also leave off including the quote marks until the output stage. The following code illustrates the open and the regex, capturing the information in a hash reference

my $linkFile = q{linksource.dat};
open my $linkFH, q{<}, $linkFile
   or die qq{open: $linkFile: $!\n};

my $rhTitleToURL = {};
LINKLINE:
while (<$linkFH>)
{
    next LINKLINE
       unless m
       {(?x)
           title\s*=\s*"
           ([^"]+)
           .*?
           href\s*=\s*"
           ([^"]+)
       };
    my ($title, $url) = ($1, $2);
    $rhTitleToURL->{$title} = $url;
}

close $linkFH
   or die qq{close: $linkFile: $!\n};
[download]

You are wasting effort when extracting your @fields from the SQL file. Firstly, split acts by default on $_ so you don't have to specify it, @fields = split /,/; would do. However you then have to waste time stripping leading and trailing spaces from each field. The first argument to split is a regular expression so if you include the spaces around the comma in the regex, they won't get into the fields in the first place. Like this

my @fields = split m{\s*,\s*};
[download]

The modification of the SQL file probably ought to go like this:-

open old SQL file for reading
open new SQL file for writing
while loop reading old SQL line by line
    split line into fields
    if this is a line to be modified
        modify fields
        construct new line
        print to new SQL file
    else
        re-construct old line
        print to new SQL file
close files
[download]

I hope you find this useful and I hope you take the criticisms of your code layout in the constructive spirit they are intended.

Cheers,

JohnGG

Update: Corrected indentation in code snippet caused by inadvertant TABs.

Comment on Re: data management script Select or Download Code