data management script

mkahn has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks,

I have 2 data files: One is a sql table dump, which I would like to search and replace a couple of the fields from the second file; which has a corresponding title in one field, and an updated directory path in the other.

I am reading in $title and $url, then reading in each record from the data file, testing to see if the fields match, then dumping the fields back together and writing to an output file.

The purpose here is to update a menuing system with database generated urls from the old values.

So that part is actually working. If there is a match, it replaces it and writes the file. However the menu also contains items that dont match, but still need to be rewritten back to the output file. There are menu items which do not represent a database query, but are simply static links.

This would probably be easier if I could just substitute directly in the source file (though not for development purposes..) instead of writing a new file. But thats the approach I took. What I am trying to do here is create an array of matches, then cycle through the file again, skipping any records that previously matched, write the record from any that dont, then push that index field to the matched array.

And now I must apologise for what I'm sure is blasphemous coding technique, but if you've read this far, perhaps you'll take a look at what I've got.

I'm sure there are several other things that will come up before this does what I want it to do. But it will theoretically be a bridge between 2 of joomlas components, so the shop links can reside in a customized menu. What I should really do is integrate this with the script thats generating the menu in the first place. but for now, this is my coding lesson.

Thank you!

#!usr/bin/perl
open LINKFILE, "linksource.dat"  || die "no such file here";
while (<LINKFILE>) {
#memorize title, then url;
my ($title) = /title=(".*?")/;  ($title) =~ s/"/'/g;       ($title) =~
+ s/^\s+//g;($title) =~ s/\s+$//g;
my ($url) = /(component.*")/;   ($url) = '"/' . $url;  ($url) =~ s/"/'
+/;
#print $title , "\n" , $url, ,"\n";
my $matched = 0;
open SQL, "mark.sql";
while (<SQL>) {
@fields = split /,/ , $_ ;
foreach $i(0 .. $#fields) {
s/^\s+//, s/\s+$// for $fields[$i];
#print "\$fields[$i]:$fields[$i] \n";
}
print "\$fields[2]:|$fields[2]|   \$title:|$title| \n"  if ($title eq 
+"'Alternative Reds'");
next unless ($fields[2] eq $title) or ($i == $#fields);
push @used, $fields[0];
($fields[3] = $url) if ($fields[2] eq $title);
 my $sqlout = join "," , @fields;
             open NEWFILE , ">>newsql.sql";
            print NEWFILE "$sqlout \n";
            #print NEWFILE "\$fields[2]:$fields[2], \$title:$title \n"
+;
            close NEWFILE;
            last;
             }
}
close SQL;
close LINKFILE;

open SQL, "mark.sql";
while (<SQL>){
@fields = split /,/ , $_ ;
foreach $i(0 .. $#fields) {
s/^\s+//, s/\s+$// for $fields[$i];}
foreach $j(0..$#used) {
$match++ && (print "$_:$fields[0]") if ($fields[0] eq $used[$j]) && ne
+xt;
print "\$used:|$used[$j]| \$fields[0]:|$fields[0]| \n";
my $sqlout = join "," , @fields;
open NEWFILE , ">>newsql.sql";
print NEWFILE  "$sqlout \n" && (push @used, $fields[0])  unless $match
+;
$match = 0;
close NEWFILE;
last;
                       }
 }
[download]

Here is some sample data (2 records) from the source sql table:

(6, 'usermenu', 'Upload', 'index.php?option=com_content&task=view&id=5
+', 'content_item_link', 1, 0, 5, 0, 2, 62, '2006-06-24 11:42:36', 0, 
+0, 2, 0, 'menu_image=-1'),
(7, 'mainmenu', 'Alternative Red', 'index.php?option=com_lxmenu', 'com
+ponents', -2, 2, 19, 0, 1, 62, '2006-04-23 22:47:37', 0, 0, 0, 0, '')
+,
[download]

These records are from the linkdata source:

<a title="Wine Accessories" style="display:block;" class="mainlevel" h
+ref="/component/page,shop.browse/category_id,1/option,com_virtuemart/
+Itemid,16/" >Wine Accessories</a>

<a title="Gewurztraminer" style="display:block;" class="mainlevel" hre
+f="/component/page,shop.browse/category_id,2/option,com_virtuemart/It
+emid,16/" >Gewurztraminer</a>
[download]

READMORE added by Arunbear

Comment on data management script Select or Download Code

Replies are listed 'Best First'.
Re: data management script by chargrill (Parson) on Jul 22, 2006 at 15:17 UTC
Apologies for not answering your question, but I was distracted by your "blasphemous coding technique" (as you put it) - or as I'd put it - bad indentation, style, etc. Please don't place 4 separate statements on the same line! `my ($title) = /title=(".?")/; ($title) =~ s/"/'/g; ($title) =~ + s/^\s+//g;($title) =~ s/\s+$//g;` [download] Please properly indent things like for loops and other blocks. This is just one example, but you really lose a lot of information/visual cues that can help make heads or tails out of your code. `foreach $i(0 .. $#fields) { s/^\s+//, s/\s+$// for $fields[$i]; #print "\$fields[$i]:$fields[$i] \n"; }` [download] Honestly, it's not that difficult to clean things up manually - but if you just can't coerce yourself into some structure, at the very least, please run your code through perltidy. I (and perhaps many others) won't read your code, much less help you on it, otherwise. --chargrill `$,=42;for(34,0,-3,9,-11,11,-17,7,-5){$.=pack'c'=>$,+=$_}for(reverse s +plit//=>$* ){$%++?$ %%2?push@C,$_,$":push@c,$_,$":(push@C,$_,$")&&push@c,$"}$C[$# +C]=$/;($#C >$#c)?($ c=\@C)&&($ C=\@c):($ c=\@c)&&($C=\@C);$%=$\|;for(@$c){print$_^ +$$C[$%++]}` [download]	[reply] [d/l] [select]
Re^2: data management script by mkahn (Beadle) on Jul 22, 2006 at 16:42 UTC
Thanks for telling me. I'm using a perl editor, so the loops were obvious to me. I've made it a bit eye friendlier. The blasphemy I was referring to was my excessive use of loops, lack of subs, etc. Commenting is helping me. There are still a couple of problems. #!usr/bin/perl open LINKFILE, "linksource.dat" \|\| die "no such file here"; while (<LINKFILE>) { #memorize title, then url; my ($title) = /title=(".?")/; ($title) =~ s/"/'/g; ($title) =~ s/^\s+//g; ($title) =~ s/\s+$//g; my ($url) = /(component.")/; ($url) = '"/' . $url; ($url) =~ s/"/'/; my $match = 0; open SQL, "mark.sql"; while (<SQL>) { @fields = split /,/ , $_ ; foreach $i(0 .. $#fields) { s/^\s+//, s/\s+$// for $fields[$i]; } print "\$fields[2]:\|$fields[2]\| \$title:\|$title\| \n" if ($t +itle eq "'Alternative Reds'");# monitor - stops after first A R hit?? next unless ($fields[2] eq $title); #skip it if it doesnt mat +ch next if ($fields[4] eq "'separator'"); #skip it if its not an +URL push @used, $fields[0]; ($fields[3] = $url) if ($fields[2] eq $title); my $sqlout = join "," , @fields; open NEWFILE , ">>newsql.sql"; print NEWFILE "$sqlout \n"; close NEWFILE; last; } # finished going through the SQL } #finished cycling through linkfile close SQL; close LINKFILE; #Now go through each record, check to see if the index field matches t +he used array, and if not, write the record open SQL, "mark.sql"; while (<SQL>){ @fields = split /,/ , $_ ; #compare first record element with the used array foreach (@used) { next if $fields[0] eq $_; # if the record has been used, dont use +it again print "\$used:\|$_\| \$fields[0]:\|$fields[0]\| \n"; #monitor my $sqlout = join "," , @fields; open NEWFILE , ">>newsql.sql"; print NEWFILE "$sqlout"; push @used, $fields[0]; close NEWFILE; last; } } [download]	[reply] [d/l]
Re^3: data management script by planetscape (Chancellor) on Jul 23, 2006 at 05:51 UTC
What chargrill said... A link to perltidy and its accompanying tutorial. HTH, planetscape	[reply]
Re: data management script by johngg (Canon) on Jul 22, 2006 at 23:33 UTC
I have to agree with the comments that chargrill made regarding the layout of your code. I have looked at your reply to his post and, although the new layout is somewhat improved, I still find it difficult to read. I don't know if problems are being introduced in the process of posting this code or whether it looks like this in reality. For instance, you have a `foreach` loop that is indented by two levels (four spaces each time, that's good) yet it has a closing curly brace that is indented with nine TAB characters; that's manic. Enough about the layout, let's get to the logic. I think that you would do much better to separate the reading of the LINK file from the reading and modification of the SQL file. Read the LINK file first, constructing a hash table with the titles as the keys and the corresponding URLs as the values then close that file. After that go on to read the SQL file in a separate loop, modifying it and writing a new SQL file (both modified and unmodified lines) as you go. TIPS Always do `use strict; use warnings;` [download] at the top of your scripts to enforce some discipline and to catch typo slips. Comment your code generously and help the comments to stand out by preceding them with a blank line. Try using the three argument form of `open` to avoid problems like writing to a file that should only have been read. You correctly tested for the success of your `open`s but it would be useful to include the operating system error, `$!`, in the `die` message. I would try to avoid all of the messing about transforming the title and URL from the LINK file by crafting a better regular expression to pull out just what you want. I would also leave off including the quote marks until the output stage. The following code illustrates the `open` and the regex, capturing the information in a hash reference `my $linkFile = q{linksource.dat}; open my $linkFH, q{<}, $linkFile or die qq{open: $linkFile: $!\n}; my $rhTitleToURL = {}; LINKLINE: while (<$linkFH>) { next LINKLINE unless m {(?x) title\s=\s" ([^"]+) .? href\s=\s" ([^"]+) }; my ($title, $url) = ($1, $2); $rhTitleToURL->{$title} = $url; } close $linkFH or die qq{close: $linkFile: $!\n};` [download] You are wasting effort when extracting your `@fields` from the SQL file. Firstly, `split` acts by default on `$_` so you don't have to specify it, `@fields = split /,/;` would do. However you then have to waste time stripping leading and trailing spaces from each field. The first argument to `split` is a regular expression so if you include the spaces around the comma in the regex, they won't get into the fields in the first place. Like this `my @fields = split m{\s,\s};` [download] The modification of the SQL file probably ought to go like this:- `open old SQL file for reading open new SQL file for writing while loop reading old SQL line by line split line into fields if this is a line to be modified modify fields construct new line print to new SQL file else re-construct old line print to new SQL file close files` [download]* I hope you find this useful and I hope you take the criticisms of your code layout in the constructive spirit they are intended. Cheers, JohnGG Update: Corrected indentation in code snippet caused by inadvertant TABs.	[reply] [d/l] [select]