I have to agree with the comments that chargrill made regarding the layout of your code. I have looked at your reply to his post and, although the new layout is somewhat improved, I still find it difficult to read. I don't know if problems are being introduced in the process of posting this code or whether it looks like this in reality. For instance, you have a foreach loop that is indented by two levels (four spaces each time, that's good) yet it has a closing curly brace that is indented with nine TAB characters; that's manic.

Enough about the layout, let's get to the logic. I think that you would do much better to separate the reading of the LINK file from the reading and modification of the SQL file. Read the LINK file first, constructing a hash table with the titles as the keys and the corresponding URLs as the values then close that file. After that go on to read the SQL file in a separate loop, modifying it and writing a new SQL file (both modified and unmodified lines) as you go.

TIPS

Always do

use strict; use warnings;

at the top of your scripts to enforce some discipline and to catch typo slips.

Comment your code generously and help the comments to stand out by preceding them with a blank line.

Try using the three argument form of open to avoid problems like writing to a file that should only have been read. You correctly tested for the success of your opens but it would be useful to include the operating system error, $!, in the die message.

I would try to avoid all of the messing about transforming the title and URL from the LINK file by crafting a better regular expression to pull out just what you want. I would also leave off including the quote marks until the output stage. The following code illustrates the open and the regex, capturing the information in a hash reference

my $linkFile = q{linksource.dat}; open my $linkFH, q{<}, $linkFile or die qq{open: $linkFile: $!\n}; my $rhTitleToURL = {}; LINKLINE: while (<$linkFH>) { next LINKLINE unless m {(?x) title\s*=\s*" ([^"]+) .*? href\s*=\s*" ([^"]+) }; my ($title, $url) = ($1, $2); $rhTitleToURL->{$title} = $url; } close $linkFH or die qq{close: $linkFile: $!\n};

You are wasting effort when extracting your @fields from the SQL file. Firstly, split acts by default on $_ so you don't have to specify it, @fields = split /,/; would do. However you then have to waste time stripping leading and trailing spaces from each field. The first argument to split is a regular expression so if you include the spaces around the comma in the regex, they won't get into the fields in the first place. Like this

my @fields = split m{\s*,\s*};

The modification of the SQL file probably ought to go like this:-

open old SQL file for reading open new SQL file for writing while loop reading old SQL line by line split line into fields if this is a line to be modified modify fields construct new line print to new SQL file else re-construct old line print to new SQL file close files

I hope you find this useful and I hope you take the criticisms of your code layout in the constructive spirit they are intended.

Cheers,

JohnGG

Update: Corrected indentation in code snippet caused by inadvertant TABs.


In reply to Re: data management script by johngg
in thread data management script by mkahn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.