Substitution in conjunction with split or regx

qnguyen has asked for the wisdom of the Perl Monks concerning the following question:

Ive built a hash table below and I need to replace the spanish characters inside the ##...## with the english translations. I sometimes will not have the translations, so I will have to print a mix of spanish and english words. What is the most efficient way of performing this as my logs are very large. I have this so far, but it CANNOT replace the individual words in between the double pound signs, only the entire string. Would it be better to use regx or split/join to complete this? Any examples would be helpful.

Example: 
(5/4/07 1:45:29 PM CDT) 7819953:##uno cinco cuatro totes##;pp
I would like the following result:
(5/4/07 1:45:29 PM CDT) 7819953:##1 2 4 totes##;pp
[download]


#!/usr/bin/perl

use warnings;
use strict;

## LAN filename for translation
my $filetwo="VoiceLink only Soriana.txt";

## Debug file translated
my $fileout3="fileout3.txt";

## Debug file to be translated, must be saved as ANSI
my $linein3="Log_097102053_2007-05-04_20-34-51.txt";

## FIL file with translations
my $linein = "COM_211_RAD_07A_ES_MX.FIL";

my $englishword;
my $spanishword;
my %langhash;
my @arrayin;


open (FILEIN, $linein)  or die "Can't open first file.\n";;

## Read in each line from FIL file and create a hash table
while ($linein = <FILEIN>) 
{    
    chomp $linein;
    @arrayin = split /\|/, $linein;       ##creates an array with spli
+t function using | as separator
    $englishword = trim ( $arrayin[0] );
    $spanishword = trim ( $arrayin[1] );
  print "$spanishword -> $englishword\n"; ## test print
    $langhash{$spanishword} = $englishword;  
}               
close(FILEIN);


## Open Lan file to read in the English and Spanish words
open (FILEIN2, $filetwo)  or die "Can't open debug file to be translat
+ed.\n";;

## Go through each line and replace

while (<FILEIN2>)
{
    next unless ( /.:\s*\S/ );

    my ( $lang, $text ) = split /:/, $_, 2;
    if ( $lang eq "Default" )
    {
        $englishword = trim( $text ); # takes care of chomping
    }
    elsif ( $lang eq "Spanish_LatinAmerican|es_MX" )
    {
        $spanishword = trim( $text );
        $langhash{$spanishword} = $englishword;
    }
}
##      while (($spanishword, $englishword) = each %langhash){    ## T
+est print of entire hash table
##        print "$englishword => $spanishword\n";
##      }

close(FILEIN2);


## Open debug file to convert from English to Spanish

open (FILEIN3, $linein3)  or die "Can't open debug file.\n";
open (FILEOUT3, ">$fileout3")  or die "Can't open output file.\n";

## Go through each line and replace

while ($linein3 = <FILEIN3>) 
{ 
chomp $linein3;    

if ($linein3 =~ /##\s*([^#]+?)\s*##/)
{
            if (exists ($langhash{$1}))
            {
            $linein3=~ s/##\s*([^#]+?)\s*##/##$langhash{$1}##/;
            print FILEOUT3 "$linein3\n";
            }
            else 
            {
            $linein3=~ s/##\s*([^#]+?)\s*##/##$1##/;
            print FILEOUT3 "$linein3\n";
            }
}
else 
{
print FILEOUT3 "$linein3\n";}  ##print unmodified line where no double
+ pound exists
}
 
close(FILEIN3);
close(FILEOUT3);

sub trim 
{
   for (my $s = $_[0]) 
   {
   s/^\s+//;
   s/\s+$//;
   return $_;
   }
}
[download]

Here is an example of Hash Table:

barra -> /
cero -> 0
uno -> 1
dos -> 2
tres -> 3
cuatro -> 4
cinco -> 5
seis -> 6
siete -> 7
ocho -> 8
nueve -> 9
[download]

I have the following log below:

(5/4/07 1:44:44 PM CDT) 7782719:##uno seis uno                        
+                                                                     
+      ##;pp
(5/4/07 1:45:29 PM CDT) 7819953:##uno cinco cuatro##;pp
(5/4/07 1:45:29 PM CDT) 7821389:##uno seis uno                        
+                                                                     
+      ##;pp
(5/4/07 1:48:01 PM CDT) 7976990:##uno seis uno                        
+                                                                     
+      ##;pp
(5/4/07 1:48:01 PM CDT) 7979657:##    tres  empaques##;pp
(5/4/07 1:48:16 PM CDT) 7980901:##uno seis uno                        
+                                                                     
+      ##;pp
(5/4/07 1:48:33 PM CDT) 8007951:##Tome tres empaques  ##;pp
(5/4/07 1:48:48 PM CDT) 8010439:##BEEP (375,2)##
(5/4/07 1:48:48 PM CDT) 8011336:##Selección completa. ##;pp
(5/4/07 1:48:48 PM CDT) 8012745:##  andén Anden uno tres cinco##;pp
[download]

Comment on Substitution in conjunction with split or regx Select or Download Code

Replies are listed 'Best First'.
Re: Substitution in conjunction with split or regx by toolic (Bishop) on Oct 30, 2007 at 17:17 UTC
If I understand your question correctly, the following code should translate certain Spanish words into numerical values: Read more... (2 kB) Produces this output: (5/4/07 1:44:44 PM CDT) 7782719:##1 6 1##;pp (5/4/07 1:45:29 PM CDT) 7819953:##1 5 4##;pp (5/4/07 1:45:29 PM CDT) 7821389:##1 6 1##;pp (5/4/07 1:48:01 PM CDT) 7976990:##1 6 1##;pp (5/4/07 1:48:01 PM CDT) 7979657:## 3 empaques##;pp (5/4/07 1:48:16 PM CDT) 7980901:##1 6 1##;pp (5/4/07 1:48:33 PM CDT) 8007951:##Tome 3 empaques##;pp (5/4/07 1:48:48 PM CDT) 8010439:##BEEP (375,2)## (5/4/07 1:48:48 PM CDT) 8011336:##Selecci�n completa.##;pp (5/4/07 1:48:48 PM CDT) 8012745:## and�n Anden 1 3 5##;pp [download] This may not be the most efficient way to do this, considering that you have large files, but our fellow Monks will no doubt be able to help you optimize this.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Substitution in conjunction with split or regx
by toolic (Bishop) on Oct 30, 2007 at 17:17 UTC