Murcia has asked for the wisdom of the Perl Monks concerning the following question:

Hi conferes,

I want your hints,
how to exchange in a list with identifier these identifiers by others from a reference list?

The lists are files with different lenght!

example: List Guido bla1 ... Mike bla2 ... Klaus bla3 ... reference list Guido Meyer Mike Smith Klaus Rothschild so the new list should be Meyer bla1 ... Smith bla2 ... Rothschild bla3 ... I do it now, by putting the reference list in a hash and then I parse the list lane by lane, split the lane, exchange the i +dentifier and write a new list
How to make it quicker, better, faster ... Thanks

Replies are listed 'Best First'.
Re: exchange words in text
by borisz (Canon) on Nov 23, 2004 at 14:26 UTC
    Here is one way:
    #!/usr/bin/perl my @l = split /(\n)/, <<ENDE; Guido bla1 ... Mike bla2 ... Klaus bla3 ... ENDE my %h = qw/ Guido Meyer Mike Smith Klaus Rothschild/; for (@l) { s/(\w+)/$h{$1} || $1/e; print } __OUTPUT__ Meyer bla1 ... Smith bla2 ... Rothschild bla3 ...
    Boris
Re: exchange words in text
by Limbic~Region (Chancellor) on Nov 23, 2004 at 14:39 UTC
    Murcia,
    If the two lists are equal lengths and are already in order (item 1 from list1 corresponds to item 1 in list2), this is an easy problem.
    #!/usr/bin/perl use strict; use warnings; my @list; my $index = 0; while ( <DATA> ) { chomp; $index++ if /^\s*$/; push @{ $list[ $index ] }, (split " ", $_, 2)[1]; } my @new_list = map { $list[1][$_] . ' ' . $list[0][$_] } 0 .. $#{$list +[0]}; __DATA__ Guido bla1 ... Mike bla2 ... Klaus bla3 ... Guido Meyer Mike Smith Klaus Rothschild

    Cheers - L~R

    Disclaimer: Murcia did not originally specify that the lists were of different lengths nor were any specifics given regarding ordering. While this approach is not valid given the new information - they were valid assumptions at the time it was written. A classic example of knowing what is the right information to include when asking a question.
Re: exchange words in text - how not to !!
by Random_Walk (Prior) on Nov 23, 2004 at 15:20 UTC

    I don't think you are going to get a much better than tweaking the hash solution unless your data is rather nice and can use Limbic~Region's method. I did try another way (build a regex containing the required substitutions and eval it against the data), more to prove it was a non starter than because I thought it would be faster. Code and benchmark for a laugh.

    #!/usr/bin/perl use warnings; use strict; use Benchmark; sub simple_hash { seek DATA, 0, 0; while (<DATA>) {last if /^Names/} my %lookup; while (<DATA>) { next if /^\s*$/; last if /^Example List/; chomp; my ($first, $second)=split; $lookup{$first}=$second; } while (<DATA>) { next if /^\s*$/; chomp; my ($name, $rest)=split /\s+/, $_, 2; print $lookup{$name}, "\t", $rest, "\n"; } } sub funky_regex { seek DATA, 0, 0; while (<DATA>) {last if /^Names/} my $regex_string=""; while (<DATA>) { next if /^\s*$/; last if /^Example List/; chomp; my ($first, $second)=split; $regex_string.="s/$first/$second/;"; } local $/; $_=(<DATA>); eval $regex_string; print ; } timethese(5000000, { 'simple_hash' => &simple_hash, 'funky_regex' => &funky_regex } ); __DATA__ Names Guido Meyer Mike Smith Klaus Rothschild Mick Mouse Daffy LeCannard Example List Guido bla1 ... Mike bla2 ... Klaus bla3 ... Mick blahsome more Daffy lookout Duck ! # results funky_regex: 0 wallclock secs ( 0.30 usr + 0.00 sys = 0.30 CPU) @ 16666666.67/s (n=5000000) (warning: too few iterations for a reliable count) simple_hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 500000000.00/s (n=5000000) (warning: too few iterations for a reliable count)

    Cheers,
    R.

Re: exchange words in text
by rev_1318 (Chaplain) on Nov 23, 2004 at 14:28 UTC
    I think your approche is basicly correct. (I would go the same route.)
    If it can be tweaked, depends on your exact code. If you would like to receive comments on it, post it here.
    There may be alternatives whch are faster, but if they are easier to maintain?

    Paul

Re: exchange words in text
by artist (Parson) on Nov 23, 2004 at 17:22 UTC
    If you are on unix and your data are ordered as you have shown, try:
    join file2 file1 | cut -f2-4 -d ' '