Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

timtowtdi but which is the [quote fingers]best[/quote fingers]

by PyrexKidd (Monk)
on Apr 14, 2011 at 07:33 UTC ( [id://899371]=perlquestion: print w/replies, xml ) Need Help??

PyrexKidd has asked for the wisdom of the Perl Monks concerning the following question:

my initial question:

what is the best way to translate "TRUE" and "FALSE" to "1" or "0" respectively? or "string a", "String B", "String C" to "1-3" respectively.

my first thought was do a search/replace regex:

s/TRUE/1/imgsx; s/FALSE/0/imgsx;

but my thinking is invoking the regex engine twice causes twice the overhead. while this might not be a problem for the first 10k lines, what about 100k or 100,000k lines? So I thought I could use the tr engine, but that only translates characters. So I tried:

tr/TRUEFALSE/111100000/;

which, well... didn't work.<.p>

my relatively closely related tangent

I'm working on an application to build a database. I have a pre existing ''csv database'' which consists of a number of ''csv tables''. the end goal is a MySQL back-end and a web GUI front-end... :]

sidenote: I am not a DBA, any one have any _good_ perl DBI links? I've searched google and the perldocs extensively; I think it's my SQL skills that are lacking...

anyway, during the process of writing the program to convert the database I have had to translate company names to company id's etc. strings to values. see the above for the question about true false values I had that started this post.

since i'm building my query as a string I open my string as a file handle and then print to my string as a file... any one have any comments about that? (not trolling, genuinely interested in feedback)

open my $Q_FH, '>', \my $query;

which is better: query the database for the customer id's based on the customer name or create a hash and use the has to translate the customer name to customer_id? i.e.:

#create customer DB print $Q_FH "INSERT INTO CUSOMER (customer_id, company_name)\n"; print $Q_FH "VALUES"; print $Q_FH join(',', map{"\n($cust_id_href->{$_}, $_)"} keys %$cust_i +d_href); print $Q_FH ";\n"; #create vendor DB print $Q_FH "INSERT INTO VENDOR (vendor_id, company_name)\n"; print $Q_FH "VALUES"; print $Q_FH join(',', map{"\n($vendor_id_href->{$_}, $_)"} keys %$vend +or_id_href); print $Q_FH ";\n";

I use map here because: it's cleaner than a foreach loop, and I only have two values.

here I am only parsing one csv at a time. how would you parse multiple CSV's to create/update/insert data in multiple tables in one DB?

I tried to use Tie::Handle::CSV but when I would attempt to convert the table to an array I would get the follwing error:
sub table_to_array { my $table_fh = shift; my $table_aref = (); foreach (<$table_fh>){ push @{$table_aref}, $_ if $_; } #@{$table_aref} = <$table_fh>; return $table_aref; } Use of uninitialized value in lc at /usr/local/lib/perl5/site_perl/5.1 +2.3/Tie/Handle/CSV/Hash.pm line 31, <$csv_fh> line 8014

which goes away if you treat the DB as a flat file instead of a CSV... go figure.

the funny thing is, no matter if the FH was a Tie'd FH or a standard flat FH the sub file_to_array worked: I was able to dereference the $file_aref and print out the array... haha, and looking back on it, the commented line is the original method, I came up with the WTFey method in a vain attempt to fix the 'bug'...

while this is even more off topic: what would you say is a _better_ practice: --creating multiple INSERT statements for all of the data, --or-- --creating one large INSERT statement for all of the data (as I have done above)?

as always thanks for the assistance.

Replies are listed 'Best First'.
Re: timtowtdi but which is the [quote fingers]best[/quote fingers]
by GrandFather (Saint) on Apr 14, 2011 at 07:48 UTC

    Compared to IO time any technique from carving on a stone tablet up is likely to be fast enough. Until you see a speed problem in practice with real data don't worry about it. If you are worried, profile the code to find out where the time is actually spent rather than guessing. And always remember that a change of algorithm will generally get better results than fiddling with details.

    Having said that, the following technique may be of interest:

    #!/usr/bin/perl -w use warnings; use strict; my %translate = ( "string a" => 1, "string b" => 2, "string c" => 3, true => 1, false => 0, ); my $match = "\Q" . join ("\E|\Q", keys %translate) . "\E"; while (<DATA>) { s/($match)/$translate{lc $1}/ige; print; } __DATA__ what is the best way to translate "TRUE" and "FALSE" to "1" or "0" res +pectively? or "string a", "String B", "String C" to "1-3" respectively.

    Prints:

    what is the best way to translate "1" and "0" to "1" or "0" respective +ly? or "1", "2", "3" to "1-3" respectively.
    True laziness is hard work
      my $match = "\Q" . join ("\E|\Q", keys %translate) . "\E";

      There's a problem here. In interpolated strings, the  \Q and  \E escapes act as interpolation 'controls' (if that's the right term) and softly and silently vanish away. Except for raising a ruckus, using non-interpolating strings doesn't help matters:

      >perl -wMstrict -le "my %translate = ( 'tr??u++e' => 1, 'fa+l?se' => 0, ); ;; my $match = qq{\Q} . join(qq{\E|\Q}, keys %translate) . qq{\E}; print qq{'$match'}; $match = qr{$match}; print $match; ;; my $mooch = '\Q' . join('\E|\Q', keys %translate) . '\E'; print qq{'$mooch'}; $mooch = qr{$mooch}; print $mooch; " 'tr??u++e|fa+l?se' (?-xism:tr??u++e|fa+l?se) '\Qtr??u++e\E|\Qfa+l?se\E' Unrecognized escape \Q passed through in regex... Unrecognized escape \E passed through in regex... Unrecognized escape \Q passed through in regex... Unrecognized escape \E passed through in regex... (?-xism:\Qtr??u++e\E|\Qfa+l?se\E)

      Update:

      There's another problem. The unordered keys of a hash are being used to build an ordered regex alternation, which always matches the first alternation match found even if there is a longer match in a subsequent alternation possibility. The problem may not bite in this application because the strings being searched (per the example in the OP) seem to be nonword-bounded words (and throwing in some  \b assertions would probably help, as Your Mother points out in Re: timtowtdi but which is the [quote fingers]best[/quote fingers]), but in another case (and if the longest match is, indeed, desired) it might:

      >perl -wMstrict -le "my %xlate = ( A => '1', AA => '2', AAA => '3', AAAA => '4', ); my $mooch = join '|', keys %xlate; $mooch = qr{ (?i) $mooch }xms; my $match = join '|', reverse sort keys %xlate; $match = qr{ (?i) $match }xms; print $mooch; print $match; ;; my $s = 'xAxAAxAAAxAAAAx'; print qq{'$s'}; $s =~ s{ ($mooch) }{$xlate{ uc $1 }}xmsg; print qq{'$s'}; ;; $s = 'xAxaaxAaAxaAaAx'; print qq{'$s'}; $s =~ s{ ($match) }{$xlate{ uc $1 }}xmsg; print qq{'$s'}; " (?msx-i: (?i) A|AA|AAAA|AAA ) (?msx-i: (?i) AAAA|AAA|AA|A ) 'xAxAAxAAAxAAAAx' 'x1x11x111x1111x' 'xAxaaxAaAxaAaAx' 'x1x2x3x4x'

      Since the OP seeems to be looking for longest words and word phrases, I think I would go for the  \b-elt-and-suspenders approach of something like:

      >perl -wMstrict -le "my %xlate = ( 'string a' => '3', 'string b' => '2', true => '1', false => '0', tt => '22', ttt => '33', ); my $xl = join '|', map { quotemeta } reverse sort keys %xlate; $xl = qr{ (?i) \b (?: $xl) \b }xms; ;; my $s = 'vv TTT ww TT String A xx sTrInG b xtruex True yy FALSE zz'; print qq{'$s'}; ;; $s =~ s{ ($xl) }{$xlate{ lc $1 }}xmsg; print qq{'$s'}; " 'vv TTT ww TT String A xx sTrInG b xtruex True yy FALSE zz' 'vv 33 ww 22 3 xx 2 xtruex 1 yy 0 zz'
Re: timtowtdi but which is the [quote fingers]best[/quote fingers]
by JavaFan (Canon) on Apr 14, 2011 at 08:44 UTC
    what is the best way to translate "TRUE" and "FALSE" to "1" or "0" respectively?
    Consider that's speed you're after: the answer is: use a different language than Perl.

    Really, if this is your bottle neck, and it's a bottle neck needing fixing, Perl was very bad pick as the language to use on the project.

      Consider that's speed you're after: the answer is: use a different language than Perl.

      I agree with many of your more controversial assertions; but that one is a gross over simplification.

      For many things, especially text manipulations, well thought through Perl programs will run a highly tuned C program a very close second. And if you factor in the development time, maintainability and portability, they'll usually win hands down.

      But the trick to good performance--as you well know and frequently demonstrate--is to write the Perl program in that way that best plays to its strengths. In general, that means avoiding the O'Woe trap and writing the simplest code that gets the job done.

      Just because someone is prepared to spend an extra hour or two to get the best performance available from their perl script, it does not mean it warrents spending a week or two trying to write and tune the equivalent C(Java/Other) program.

      In this case, as GrandFather points out, making two passes of each line rather than one, will likely make very little difference. Assuming that is the total extent of the processing required between reading and writing each record. But that is not always the case, and becoming familiar with which Perl techniques are more efficient than others is as good a function of this place as any other.

      Asking certainly should not be cause for your derision, just because you know that in this case the suggested alternatives are unlikely to be of benefit. Much less because you are sufficiently experienced to consider writing things in other languages when you consider that would be advantageous to your needs. Many who come here do not readily have that option. And others would prefer to avoid it where possible,


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: timtowtdi but which is the [quote fingers]best[/quote fingers]
by Your Mother (Archbishop) on Apr 14, 2011 at 14:45 UTC

    Going back to your first thought-

    s/TRUE/1/imgsx; s/FALSE/0/imgsx;

    Except the g those flags are all noise or a mistake given your spec. E.g., "Falsely" will become "0ly."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://899371]
Approved by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-23 17:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found