Matching First Character of Strings Efficiently

Limbic~Region has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Matching First Character of Strings Efficiently (yuck) by tye (Sage) on Mar 15, 2004 at 19:58 UTC
Premature nano-optimization... `ord($a) == ord($b)` [download] - tye	[reply] [d/l]
Re^2: Matching First Character of Strings Efficiently (yuck) by Aristotle (Chancellor) on Mar 15, 2004 at 20:24 UTC
Lovely. :) Makeshifts last the longest.	[reply]
Re: Matching First Character of Strings Efficiently by BrowserUk (Patriarch) on Mar 15, 2004 at 19:55 UTC
Trade a little setup time for speed in the loop and use a hash? `my @list = ...; my %list; @list{ map{ substr $_, 0, 1 } @list } = (); my $str_a = 'Foo'; my $first_c = substr $str_a, 0, 1; for my $str_b ( @list ) { next if exists $list{ $first_char }; expensive_function($str_a , $str_b); }` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 20:16 UTC
BrowserUk, I am very interested to see if this solution has an efficiency breakpoint. It certainly would not be as memory efficient as some of the other solutions, but that was not a requirement - just speed. Cheers - L~R	[reply]
Re: Re: Re: Matching First Character of Strings Efficiently by BrowserUk (Patriarch) on Mar 15, 2004 at 21:53 UTC
Sorry Limbic~Region, I got carried away with your "Only modify this line" (and had a brainfart too!). There is no point in looping over the array, that's why we put in a hash. It should have looked something like this. `my @list = ...; my %list; @list{ map{ substr $_, 0, 1 } @list } = (); my $str_a = 'Foo'; expensive_function($str_a , $str_b) unless exists $list{ substr $str_a, 0, 1 };` [download] Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail	[reply] [d/l]
Re: Re: Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 22:35 UTC
Re+: Matching First Character of Strings Efficiently by BrowserUk (Patriarch) on Mar 16, 2004 at 00:07 UTC
Re: Matching First Character of Strings Efficiently by hardburn (Abbot) on Mar 15, 2004 at 19:51 UTC
If we play with the definition of "first character", Perl will match a number at the beginning of a string: `$_ = "13foo"; if($_ == 13) { print "Works\n"; } __OUTPUT__ Works` [download] But doing it under warnings gives you `Argument "13foo" isn't numeric in numeric eq (==)`. The ++ operator also has some DWIM magic involved for incrementing strings that begin with numbers (but -- doesn't). For any strings, you can also use `split` like this: `my ($first, $rest) = split '', $str, 2;` [download] You can also play string reversal tricks, if you don't mind modifying the string: `$_ = reverse $str; print chop;` [download] ---- `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 19:58 UTC
hardburn, I did say first letter, not "first character". Regardless, your first proposal will not work because it is checking one string against a hard coded value. The requirement is to test the first letter/character of one string against the first letter/character of a second string. I will add your split solution to the benchmark. While I did not explicitly state modifying the string is not allowed, the expensive_function will not work unless the strings were "as received". I will however add this solution to the benchmark by appending the chop'd character back on to the re-reversed string. Cheers - L~R	[reply]
Re: Matching First Character of Strings Efficiently by kvale (Monsignor) on Mar 15, 2004 at 20:01 UTC
`substr` is usually pretty fast: `my $str_a = 'Foo'; my $letter = substr $str_a, 0, 1; for my $str_b ( @list ) { next if $letter eq substr $str_b, 0, 1; expensive_function($str_a , $str_b); }` [download] -Mark	[reply] [d/l] [select]
Re: Re: Matching First Character of Strings Efficiently by bageler (Hermit) on Mar 15, 2004 at 20:38 UTC
why not combine the two commands in the loop? `my $str_a = 'Foo'; my $letter = substr $str_a, 0, 1; for my $str_b ( @list ) { unless ($letter eq substr $str_b,0,1) { expensive_function($str_a +, $str_b); } }` [download] update: unless eq is going to be slightly better than if ne	[reply] [d/l]
Re:x3 Matching First Character of Strings Efficiently by grinder (Bishop) on Mar 15, 2004 at 22:24 UTC
Well, if this is in a tight loop, then you might gain a bit by not building a scope each time through the loop: `expensive_function($str_a , $str_b) unless ($letter eq substr $str_b,0 +,1);` [download] IOW, get rid of the curlies. I'd be curious to find out if this really has any significant effect... Or even... `expensive_function($str_a , $_) unless ($letter eq substr $_,0,1) for @list;` [download] ... if we are free to play with $_ (otherwise predeclaring $str_b will work too).	[reply] [d/l] [select]
Re: Matching First Character of Strings Efficiently by delirium (Chaplain) on Mar 15, 2004 at 19:55 UTC
Untested: `my $str_a = 'Foo'; my $fc = substr($str_a,0,1); expensive_function($str_a, $_) for grep {! /^$fc/o} @list;` [download]	[reply] [d/l]
Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 20:01 UTC
delerium, As I stated, you are not allowed to change the looping construct. See the comment that says # only change this. I will include this solution (combination of substr and a regex) into the benchmark list though. Cheers - L~R	[reply]
Re: Matching First Character of Strings Efficiently by Aristotle (Chancellor) on Mar 15, 2004 at 20:17 UTC
In the worst case, index will crawl the entirety of `$str_a` looking for the character you extracted from `$str_b` so it can't be the fastest solution. You want to make sure you only make a single one-character comparison. And because one of the strings is invariant during the loop, you want to move its first-character extraction out of the loop. That gives us the canonical solution of `my $str_a = 'Foo'; my $chr_a = substr $str_a, 0, 1; for my $str_b ( @list ) { next if $chr_a eq substr($str_b, 0, 1); expensive_function($str_a , $str_b); }` [download] Update: beaten to the punch by kvale. Makeshifts last the longest.	[reply] [d/l]
Re: Matching First Character of Strings Efficiently by tachyon (Chancellor) on Mar 15, 2004 at 20:28 UTC
Well the inline C solution is pretty short, dunno about faster or particularly creative though.... `use Inline 'C'; my $str1 = 'foo'; for my $str2( qw( foo bar baz foxtrot perlmonks f ) ) { print "'$str1' cmp '$str2' result: ", same_scan($str1, $str2) ? "same\n" : "different\n"; } __END__ __C__ int same_scan(char* str1, char* str2) { return str1[0] == str2[0] ? 1 : 0; } /* 'foo' cmp 'foo' result: same 'foo' cmp 'bar' result: different 'foo' cmp 'baz' result: different 'foo' cmp 'foxtrot' result: same 'foo' cmp 'perlmonks' result: different 'foo' cmp 'f' result: same */` [download] cheers tachyon	[reply] [d/l]
Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 21:19 UTC
All, Here are the preliminary results: Read more... (2 kB) Here is the script I used to build the list: Read more... (576 Bytes) And finally, here is the benchmark itself Read more... (3 kB) Note: I have not had a chance to verify that each one of these tests is calling expensive_function the same number of times so please check back if you are interested. I need to head home now. Cheers - L~R Update: See updated benchmark.	[reply] [d/l] [select]
Re^2: Matching First Character of Strings Efficiently (benchmarking) by tye (Sage) on Mar 15, 2004 at 22:01 UTC
Including 'expensive_function' in a benchmark means that you are mostly measuring how often that function gets called (which isn't what you asked for) and reduces the differences in the remaining items to the level of noise (which a worse benchmark). I commented out the guts of expensive_function, shortened the names, added tye2 which does what I thought was the obvious optimization of pulling out the first char of $str_a at the top: `'tye2' => sub { my $f = ord($str_a); for my $str_b ( @list ) { next if $f != ord( $str_b ); expensive_function ( $str_a, $str_b ); } },` [download] and got these results: Rate hb1 hb2 L~R3 BUk L~R2 dlrm L~R tye1 kvale ty +e2 hb1 46.5/s -- -35% -35% -38% -60% -62% -70% -73% -73% -7 +4% hb2 71.4/s 53% -- -0% -6% -38% -42% -54% -59% -59% -6 +1% L~R3 71.4/s 53% 0% -- -6% -38% -42% -54% -59% -59% -6 +1% BUk 75.7/s 63% 6% 6% -- -35% -39% -51% -56% -56% -5 +8% L~R2 116/s 149% 62% 62% 53% -- -6% -25% -33% -33% -3 +6% dlrm 124/s 166% 73% 73% 63% 7% -- -20% -28% -29% -3 +2% L~R 155/s 233% 117% 117% 105% 34% 25% -- -10% -11% -1 +4% tye1 172/s 270% 141% 141% 128% 48% 39% 11% -- -1% - +5% kvale 174/s 274% 143% 143% 130% 50% 41% 12% 1% -- - +4% tye2 181/s 289% 154% 154% 139% 56% 46% 17% 5% 4% +-- [download] But I still consider such nano-optimization games to be more a waste than of value. (: - tye	[reply] [d/l] [select]
Re: Re^2: Matching First Character of Strings Efficiently (benchmarking) by Limbic~Region (Chancellor) on Mar 15, 2004 at 22:58 UTC
tye, Calculating the ordinal value of $str_a only once was an obvious optimization. I was hurrying to get out of the office - that will teach me. But I still consider such nano-optimization games to be more a waste than of value. I completely disagree with you. You don't seem to condemn golfing, which I feel is more of a waste (probably because I am no good at it). It is not like I was complaining that my solution wasn't fast enough. I feel that such games are of value for a few of reasons: It can be fun It can help disprove myths Figuring out the most efficient way once allows it to be reused in the future effortlessly. Now I am not saying one should sacrifice readability in order to save a few cycles or that a person should spend more programmer time than they will save in run time. As a periodic academic excersise, I see it has a lot of value. Of course, YMMV since you know a lot more than I do. Cheers - L~R	[reply]
Re^4: Matching First Character of Strings Efficiently (nano-opt) by tye (Sage) on Mar 15, 2004 at 23:15 UTC
Re: Re^2: Matching First Character of Strings Efficiently (benchmarking) by graff (Chancellor) on Mar 16, 2004 at 01:11 UTC
`Rate ... hb1 46.5/s ... ... tye2 181/s ...` [download] But I still consider such nano-optimization games to be more a waste than of value. Since when is a 4-to-1 speed-up "nano"? Granted, in the OP's case, maybe this particular 4:1 ratio would end up affecting only 10% of the overall run-time, once "expensive_process" (whatever it is) plays its role. Also, I wouldn't really want to bust my butt running an exhaustive test of 12 alternatives -- just compare the first (or current) idea against the next idea that looks like it could do better, in case the current idea seems to be taking longer than I like. But given that this sort of vetting task is likely to come up, if I had to do it over an input list of, say, 2 million items, I'd much prefer that it take 3 hours rather than 12. update: Having seen the replies to my post from tye and Aristotle, I have to say (sheepishly) thanks for the wake-up -- I should have known better than to take this sort of benchmark tabulation at "face value". In fact, I have used Benchmark myself on other problems -- but I opted for the "timethese" method (not "cmpthese"), which offers the advantage of showing the overall wall-clock difference as well as the "relative proportional" time difference for each approach. In one attempt, a comparison of two approaches showed "5.44/s" for one and "94.34/s" for the other -- like wow, huh? -- but the wall-clock difference was 2 sec vs. 9 sec... go figure. If an app really needs to be optimized, the first tool of choice should be profiling, of course, to see what segment of the code is eating all the time, so you can rethink that part first.	[reply] [d/l]
Re^4: Matching First Character of Strings Efficiently (4 times!) by tye (Sage) on Mar 16, 2004 at 02:21 UTC
Re^4: Matching First Character of Strings Efficiently (benchmarking) by Aristotle (Chancellor) on Mar 16, 2004 at 03:21 UTC
Re: Re: Matching First Character of Strings Efficiently by Roy Johnson (Monsignor) on Mar 15, 2004 at 21:45 UTC
It looks like the "BrowserUK" solution will never call expensive_function. If $fc ever showed up in @list, that test will always yield true. If not, it will call expensive_function every time through. The PerlMonk `tr///` Advocate	[reply]
Re: Re: Matching First Character of Strings Efficiently by tachyon (Chancellor) on Mar 16, 2004 at 04:09 UTC
Just looking at the faster versions and removing everything that is not the function (ie one true, one false test makes a set)... #!/usr/bin/perl use Inline C =>; use Benchmark 'cmpthese'; # Rate tchn tchn2 kval LR tye2 tye1 # tchn 556060/s -- -9% -19% -20% -50% -64% # tchn2 611783/s 10% -- -11% -12% -45% -60% # kval 690724/s 24% 13% -- -0% -38% -55% # LR 691904/s 24% 13% 0% -- -38% -55% # tye2 1114446/s 100% 82% 61% 61% -- -28% # tye1 1542899/s 177% 152% 123% 123% 38% -- my $str_a = 'a'x255; my $str_b = 'b'x255; my $str_c = $str_a; my ( $fc ); cmpthese -5, { 'tchn' => sub { $a = same_scan( $str_a, $str_b ); $a = same_scan( $str_a, $str_c ); }, 'tchn2' => sub { $a = same_scan2( $str_a, $str_b ); $a = same_scan2( $str_a, $str_c ); }, 'LR' => sub { $a = index($str_a, substr($str_b, 0, 1)); $a = index($str_a, substr($str_b, 0, 1)); }, 'tye1' => sub { $a = ord $str_a != ord $str_b; $a = ord $str_a != ord $str_c; }, 'tye2' => sub { $fc = ord $str_a; $a = $fc != ord( $str_b ); $fc = ord $str_a; $a = $fc != ord( $str_b ); }, 'kval' => sub { $fc = substr $str_a, 0, 1; $a = $fc ne substr $str_b, 0, 1; $fc = substr $str_a, 0, 1; $a = $fc ne substr $str_c, 0, 1; }, }; __END__ __C__ int same_scan(char* str1, char* str2) { return str1[0] == str2[0] ? 1 : 0; } int same_scan2(SV* str1, SV* str2) { return SvPV(str1, PL_na)[0] == SvPV(str2,PL_na)[0] ? 1 : 0; } [download] cheers tachyon	[reply] [d/l]
Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 22:47 UTC
tye mentioned in the CB that he suspected one of them was broke and expensive_function wasn't being called. I knew immediately he was right, but didn't have a chance to figure it out before heading home. Of course I figured it out on the way home but not in time for Roy Johnson to have already pointed it out. I took some of tye's comments and produced the following results: Read more... (3 kB) Cheers - L~R	[reply] [d/l] [select]
Re: Re: Matching First Character of Strings Efficiently by kral (Monk) on Mar 16, 2004 at 14:53 UTC
In my precedent message, I proposed a solution that was not considered maybe because it was in a test context. As a newbie in Perl, I would like to know if my solution is good or wrong. `my $fc = substr($str_a,0,1); for my $str_b ( @list ) { next if ord($fc) == ord((split(//, $str_b))[0]); expensive_function ( $str_a, $str_b ); }` [download] Tnx to all. :^) ---------- kral (sorry for my english)	[reply] [d/l]
Re: Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 16, 2004 at 15:02 UTC
kral, Actually, I just missed it when I was doing the benchmark. I would give you an award for having the most un-necessary steps. Let me explain why. You use substr to get 1st char even though ord returns numeric value of 1st char You use split to get a list of individual characters and throw them away using a slice You then use ord again for the test There is however, nothing wrong with it. Cheers - L~R	[reply]
Re: Re: Re: Re: Matching First Character of Strings Efficiently by kral (Monk) on Mar 16, 2004 at 16:59 UTC
Re: Re: Re: Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 16, 2004 at 17:18 UTC
Some notes below your chosen depth have not been shown here
Re: Matching First Character of Strings Efficiently by kral (Monk) on Mar 15, 2004 at 21:36 UTC
A summary of all solutions... Here there's a benchmarked code: Read more... (2 kB) BrowserUk's solution seems to be the best! Update: I miss the Limbic~Region post, sorry... ---------- kral (sorry for my english)	[reply] [d/l]
Re: Matching First Character of Strings Efficiently by jmcnamara (Monsignor) on Mar 15, 2004 at 21:40 UTC
Here is an efficient snippet that I posted a while back, complete with benchmark: Read the first character of a string. `$char = chr ord $string;` [download] But if you don't care what the character actually is you could just use ord (as tye shows above). -- John.	[reply] [d/l]
Re: Matching First Character of Strings Efficiently by NetWallah (Canon) on Mar 15, 2004 at 20:00 UTC
How about a simple grep: `my $str_a = 'Foo'; # Can be any length for my $str_b (grep /^$str_a/, @list ) { # Can + be any number of items expensive_function($str_a , $str_b); }` [download] Will call expensive_function only on items that start with 'Foo' (Which can be limited to one character if thats what you Really want) Update: This is the same as delirium's solution, above. Offense, like beauty, is in the eye of the beholder, and a fantasy. By guaranteeing freedom of expression, the First Amendment also guarntees offense.	[reply] [d/l]
Re: Re: Matching First Character of Strings Efficiently by Limbic~Region (Chancellor) on Mar 15, 2004 at 20:10 UTC
NetWallah, Actually it is not the same. You are checking to see if string b starts with the entire string a. This would not meet the criteria of only checking first letters. Besides, changing the looping construct was not permitted. Cheers - L~R	[reply]