exchange words in text

Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi conferes,

I want your hints,
how to exchange in a list with identifier these identifiers by others from a reference list?

The lists are files with different lenght!

example:
List
Guido bla1 ...
Mike  bla2 ...
Klaus bla3 ...

reference list
Guido Meyer
Mike Smith
Klaus Rothschild

so the new list should be
Meyer bla1 ...
Smith bla2 ...
Rothschild bla3 ...

I do it now, by putting the reference list in a hash
and then I parse the list lane by lane, split the lane, exchange the i
+dentifier 
and write a new list
[download]

How to make it quicker, better, faster ... Thanks

Comment on exchange words in text Download Code

Replies are listed 'Best First'.
Re: exchange words in text by borisz (Canon) on Nov 23, 2004 at 14:26 UTC
Here is one way: `#!/usr/bin/perl my @l = split /(\n)/, <<ENDE; Guido bla1 ... Mike bla2 ... Klaus bla3 ... ENDE my %h = qw/ Guido Meyer Mike Smith Klaus Rothschild/; for (@l) { s/(\w+)/$h{$1} \|\| $1/e; print } __OUTPUT__ Meyer bla1 ... Smith bla2 ... Rothschild bla3 ...` [download] Boris	[reply] [d/l]
Re: exchange words in text by Limbic~Region (Chancellor) on Nov 23, 2004 at 14:39 UTC
Murcia, If the two lists are equal lengths and are already in order (item 1 from list1 corresponds to item 1 in list2), this is an easy problem. `#!/usr/bin/perl use strict; use warnings; my @list; my $index = 0; while ( <DATA> ) { chomp; $index++ if /^\s$/; push @{ $list[ $index ] }, (split " ", $_, 2)[1]; } my @new_list = map { $list[1][$_] . ' ' . $list[0][$_] } 0 .. $#{$list +[0]}; __DATA__ Guido bla1 ... Mike bla2 ... Klaus bla3 ... Guido Meyer Mike Smith Klaus Rothschild` [download] Cheers - L~R Disclaimer:* Murcia did not originally specify that the lists were of different lengths nor were any specifics given regarding ordering. While this approach is not valid given the new information - they were valid assumptions at the time it was written. A classic example of knowing what is the right information to include when asking a question.	[reply] [d/l]
Re: exchange words in text - how not to !! by Random_Walk (Prior) on Nov 23, 2004 at 15:20 UTC
I don't think you are going to get a much better than tweaking the hash solution unless your data is rather nice and can use Limbic~Region's method. I did try another way (build a regex containing the required substitutions and eval it against the data), more to prove it was a non starter than because I thought it would be faster. Code and benchmark for a laugh. #!/usr/bin/perl use warnings; use strict; use Benchmark; sub simple_hash { seek DATA, 0, 0; while (<DATA>) {last if /^Names/} my %lookup; while (<DATA>) { next if /^\s$/; last if /^Example List/; chomp; my ($first, $second)=split; $lookup{$first}=$second; } while (<DATA>) { next if /^\s$/; chomp; my ($name, $rest)=split /\s+/, $_, 2; print $lookup{$name}, "\t", $rest, "\n"; } } sub funky_regex { seek DATA, 0, 0; while (<DATA>) {last if /^Names/} my $regex_string=""; while (<DATA>) { next if /^\s*$/; last if /^Example List/; chomp; my ($first, $second)=split; $regex_string.="s/$first/$second/;"; } local $/; $_=(<DATA>); eval $regex_string; print ; } timethese(5000000, { 'simple_hash' => &simple_hash, 'funky_regex' => &funky_regex } ); __DATA__ Names Guido Meyer Mike Smith Klaus Rothschild Mick Mouse Daffy LeCannard Example List Guido bla1 ... Mike bla2 ... Klaus bla3 ... Mick blahsome more Daffy lookout Duck ! # results funky_regex: 0 wallclock secs ( 0.30 usr + 0.00 sys = 0.30 CPU) @ 16666666.67/s (n=5000000) (warning: too few iterations for a reliable count) simple_hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 500000000.00/s (n=5000000) (warning: too few iterations for a reliable count) [download] Cheers, R.	[reply] [d/l]
Re: exchange words in text by rev_1318 (Chaplain) on Nov 23, 2004 at 14:28 UTC
I think your approche is basicly correct. (I would go the same route.) If it can be tweaked, depends on your exact code. If you would like to receive comments on it, post it here. There may be alternatives whch are faster, but if they are easier to maintain? Paul	[reply]
Re: exchange words in text by artist (Parson) on Nov 23, 2004 at 17:22 UTC
If you are on unix and your data are ordered as you have shown, try: `join file2 file1 \| cut -f2-4 -d ' '` [download]	[reply] [d/l]