comment on

I have to agree with the comments that chargrill made regarding the layout of your code. I have looked at your reply to his post and, although the new layout is somewhat improved, I still find it difficult to read. I don't know if problems are being introduced in the process of posting this code or whether it looks like this in reality. For instance, you have a foreach loop that is indented by two levels (four spaces each time, that's good) yet it has a closing curly brace that is indented with nine TAB characters; that's manic.

Enough about the layout, let's get to the logic. I think that you would do much better to separate the reading of the LINK file from the reading and modification of the SQL file. Read the LINK file first, constructing a hash table with the titles as the keys and the corresponding URLs as the values then close that file. After that go on to read the SQL file in a separate loop, modifying it and writing a new SQL file (both modified and unmodified lines) as you go.

TIPS

Always do

use strict;
use warnings;
[download]

at the top of your scripts to enforce some discipline and to catch typo slips.

Comment your code generously and help the comments to stand out by preceding them with a blank line.

Try using the three argument form of open to avoid problems like writing to a file that should only have been read. You correctly tested for the success of your opens but it would be useful to include the operating system error, $!, in the die message.

I would try to avoid all of the messing about transforming the title and URL from the LINK file by crafting a better regular expression to pull out just what you want. I would also leave off including the quote marks until the output stage. The following code illustrates the open and the regex, capturing the information in a hash reference

my $linkFile = q{linksource.dat};
open my $linkFH, q{<}, $linkFile
   or die qq{open: $linkFile: $!\n};

my $rhTitleToURL = {};
LINKLINE:
while (<$linkFH>)
{
    next LINKLINE
       unless m
       {(?x)
           title\s*=\s*"
           ([^"]+)
           .*?
           href\s*=\s*"
           ([^"]+)
       };
    my ($title, $url) = ($1, $2);
    $rhTitleToURL->{$title} = $url;
}

close $linkFH
   or die qq{close: $linkFile: $!\n};
[download]

You are wasting effort when extracting your @fields from the SQL file. Firstly, split acts by default on $_ so you don't have to specify it, @fields = split /,/; would do. However you then have to waste time stripping leading and trailing spaces from each field. The first argument to split is a regular expression so if you include the spaces around the comma in the regex, they won't get into the fields in the first place. Like this

my @fields = split m{\s*,\s*};
[download]

The modification of the SQL file probably ought to go like this:-

open old SQL file for reading
open new SQL file for writing
while loop reading old SQL line by line
    split line into fields
    if this is a line to be modified
        modify fields
        construct new line
        print to new SQL file
    else
        re-construct old line
        print to new SQL file
close files
[download]

I hope you find this useful and I hope you take the criticisms of your code layout in the constructive spirit they are intended.

Cheers,

JohnGG

Update: Corrected indentation in code snippet caused by inadvertant TABs.

In reply to Re: data management script by johngg
in thread data management script by mkahn

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.