I have the following code to search a tab-delimited text file (the first column only) for PAIRS of names. The pairs of names are passed on the command line. There is a match when both names of any pair are found in that 1st column. A match 'CODE' starts a s '' (empty string). If the 1st pair matches it adds an 'A' to the match CODE. If the 2nd pair matches it adds an 'B' to the match CODE. 3rd pair a 'C', 4th pair a 'D', 5th pair an 'E'. So, depending on the pair of names matched, the match code could be any combo of the five letters ABCDE: any 1,2,3,4 or all five letter or none. If a match is found, the match code and the line are sent to the output file.

I simply need a way to make this overall process as fast as possible. Help on any area: string search, concatenation or whatever.

One other important point/question. I must search using string literals, not regexes. Would a language like C, C++ or C# have faster string literal searching functionality as opposed to Perl? If so is there a source of info on how to go about this. Thanks in advance.

---------------------------------------------------------
# use strict; # use warnings; $start_run = time(); use v5.10; use Win32::OLE; use autodie; # -------------------------------------------------------------------- +--------------------------------------------------------- sub name_search(@_, $search_string) { # say $search_string ; $found_code = '' ; if(((index $search_string, $ARGV[0]) >= 0) && ((index $search_stri +ng, $ARGV[1]) >= 0)) {$found_code = 'A' ;} if($#ARGV > 2) { if(((index $search_string, $ARGV[2]) >= 0) && ((index $search_ +string, $ARGV[3]) >= 0)) {$found_code .= 'B' ;} if($#ARGV > 4) { if(((index $search_string, $ARGV[4]) >= 0) && ((index $sea +rch_string, $ARGV[5]) >= 0)) {$found_code .= 'C' ;} if($#ARGV > 6) { if(((index $search_string, $ARGV[6]) >= 0) && ((index +$search_string, $ARGV[7]) >= 0)) {$found_code .= 'D' ;} if($#ARGV > 8) { if(((index $search_string, $ARGV[8]) >= 0) && ((in +dex $search_string, $ARGV[9]) >= 0)) {$found_code .= 'E' ;} } } } } return $found_code ; } $print_string = "" ; # Create header for output file. $print_string .= "\t\t" . 'A: ' . $ARGV[0] . " " . $ARGV[1] . "\n" + . "\t\t" . 'B: ' . $ARGV[2] . " " . $ARGV[3] . " +\n" . "\t\t" . 'C: ' . $ARGV[4] . " " . $ARGV[5] . " +\n" . "\t\t" . 'D: ' . $ARGV[6] . " " . $ARGV[7] . " +\n" . "\t\t" . 'E: ' . $ARGV[8] . " " . $ARGV[9] . " +\n" . 'CODE' . "\t" . 'NAME' . "\t" . 'RUNNER' . "\t +" . 'INFO1' . "\t" . 'INFO2' . "\t" . 'INFO3' . "\ +t" . 'INFO4' . "\t" . 'INFO5' . "\t" . 'INFO6' . "\ +t" . 'INFO7' . "\t" . 'INFO8' . "\t" . 'INFO9' . "\t" . 'INFO10' . " +\t" . 'INFO11' . "\t" . 'INFO12' . "\n"; #print $print_string; my @line ; $found_tag = ''; #=pod open (my $data, "<", 'SearchTable.txt'); while(<$data>){ chomp ; @line = split( /\t/, $_ ); $search_string = $line[0] ; $found_tag = &name_search(@_, $search_string) ; # say $found_tag u +nless $found_tag eq '' ; if($found_tag ne '') { $print_string .= $found_tag . "\t" . $line[0] . "\t" . $line[1 +] . "\t" . $line[2] . "\t" . $line[3] . "\t" . $line[4] . "\t" . $line[5] . "\t" . $line[6] + . "\t" . $line[7] . "\t" . $line[8] . "\t" . $line[9] . "\t" . $line[10] . "\t" . $line[1 +1] . "\t" . $line[12] . "\t" . $line[13] . "\n" ; } } close($data); open (OUT1, ">nameS_RECORDS.txt") or die; print OUT1 $print_string; close(OUT1) ; $run_time = time() - our $start_run; print "\n\nJob took $run_time seconds\n";

In reply to Need Speed:Search Tab-delimited File for pairs of names by mnnb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.