qnguyen has asked for the wisdom of the Perl Monks concerning the following question:

I have the following code below, but I am trying to match a pattern in between two pounds ##pattern##, but the second \s* is not working as expected. It seems like it doesnt like the * or something needs be escaped? It works when there are no whitespaces between the period and ##, example "incorrecto.##pp'" Does anyone have any suggestion?
$langhash{'dígito verificador incorrecto.'} = 'digit incorrect.'; my $string = '(5/4/07 4:24:03 PM CDT) 1473721: ## dígito verificad +or incorrecto.##pp'; if ($string =~ s/##\s*([^#]+)\s*##/##$langhash{$1}##/) { print "$langhash{$1}\n"; print "$1\n"; print $string; }

Replies are listed 'Best First'.
Re: substitution not working correctly
by FunkyMonk (Bishop) on Sep 22, 2007 at 22:46 UTC
    If I've understood you correctly, you need [^#]+ to match ungreedily by using the ? quantifier:
    my %langhash; $langhash{'dígito verificador incorrecto.'} = 'digit incorrect.'; my $string = '(5/4/07 4:24:03 PM CDT) 1473721: ## dígito verificad +or incorrecto. ##pp'; if ($string =~ s/##\s*([^#]+?)\s*##/##$langhash{$1}##/) { print "$langhash{$1}\n"; print "/$1/\n"; print $string; }

    Prints:

    digit incorrect. /dígito verificador incorrecto./ (5/4/07 4:24:03 PM CDT) 1473721: ##digit incorrect.##pp

    Is that what you're after?

      If you're going to use a non-greedy match, you might as well get rid of the character class.
      $string =~ s/##\s*(.*?)\s*##/##$langhash{$1}##/s
Re: substitution not working correctly
by ikegami (Patriarch) on Sep 23, 2007 at 04:31 UTC
    Alternatively, remove the space afterwards.
    sub trim { for (my $s = $_[0]) { s/^\s+//; s/\s+$//; return $_; } } $string =~ s/##([^#]+)##/##$langhash{trim($1)}##/;
Re: substitution not working correctly
by mr_mischief (Monsignor) on Sep 22, 2007 at 23:04 UTC
    Try changing:
    $string =~ s/##\s*([^#]+)\s*##/##$langhash{$1}##/
    to:
    # v $string =~ s/##\s*([^#]+?)\s*##/##$langhash{$1}##/ # ^

    You have asked for all of the non-number-symbol characters available and then spaces, but spaces are non-number-symbols characters. You're capturing your ending spaces in $1, then '\s*' comes up with nothing. There is no $langhash{dígito verificador incorrecto. '} or $langhash{dígito verificador incorrecto.   '} or anything of that sort. Making the capture of non-number symbols non-greedy allows the '\s*' towards the end to match the spaces to the right of your match.

    If you had used strictures and warnings as is often recommended here on PM, you would have received the same "Use of unitialized value in concatenation..." warnings I did running your original code.

      Yes, it was a problem with greediness, thank you. I ran into a problem when I ran my code with warning on. I receive the error 'use of uninitialized value in concactanation' and I get no value in my printout. I know it's because the hash key and value are not present in my hash table. 1. Is there a way to ignore the substitution if the hash value does not exist? 2. Also, is there a way to ignore leading and trailing whitespaces in my hash value when using a match or substituion or would it be more effecient to remove the whitespaces and rewrite the hash table.
      use warnings; #!/usr/bin/perl ##Objective is to translate Spanish Prompts to English Prompts ## FIL filename for translation $fileone="COM_211_RAD_07A_ES_MX.FIL"; ## LAN filename for translation $filetwo="VoiceLink only Soriana.lan"; ## Debug file to be translated $filetotranslate = "test.txt"; ## English to Spanish translations file $fileout="out.txt"; $fileout2="out2.txt"; ## Debug file with translated from Spanish to English $fileout3="debug.txt"; ## Open the lan file to read in the English and Spanish words ##open (FILEIN, $fileone) or die "Can't open first file.\n";; ## Read in each line and create a hash table ##while ($linein = <FILEIN>) { ## chomp $linein; ## @arrayin = split /\|/, $linein; ##creates an array wit +h split function using | as separator ## $englishword = $arrayin[0]; ## $spanishword = $arrayin[1]; ## print "$englishword -> $spanishword\n"; ## $langhash{$spanishword} = $englishword; ## print "$langhash{$spanishword}\n"; ##} ##close(FINEIN); ## Open Lan file to read in the English and Spanish words open (FILEIN2, $filetwo) or die "Can't open debug file to be translat +ed.\n";; open (FILEOUT, ">$fileout2") or die "Can't open output file.\n";; ## Go through each line and replace while ($linein = <FILEIN2>) { chomp $linein; @arrayin = split /:/, $linein; ##creates an array with + split function using : as separator if ($arrayin[0] eq "Default") { $englishword = $arrayin[1]; } if ($arrayin[0] eq "Spanish_LatinAmerican|es_MX") { $spanishword = $arrayin[1]; } if ($arrayin[0] eq "Spanish_LatinAmerican|es_MX") { $langhash{$spanishword} = $englishword; ## print FILEOUT "$englishword -> $spanishword\n"; }} while (($spanishword, $englishword) = each %langhash){ ##Prin +t entire hash table print FILEOUT "$spanishword=>$englishword\n"; } close(FILEIN2); close(FILEOUT); ## Open debug file to convert from English to Spanish open (FILEIN3, $filetotranslate) or die "Can't open debug file.\n"; open (FILEOUT3, ">$fileout3") or die "Can't open output file.\n"; ## Go through each line and replace while ($linein3 = <FILEIN3>) { if ($linein3 =~ s/##\s*([^#]+?)\s*##/##$langhash{$1}##/){ print FILEOUT3 "$linein3"; } else { print FILEOUT3 "$linein3"; }}
        s/##\s*(.+?)\s*##/'##'. (exists($langhash{$1}) ? $langhash{$1} : $1) . + '##'/es
Re: substitution not working correctly
by mwah (Hermit) on Sep 22, 2007 at 23:16 UTC

    I'd include the dot \. into the capture,
    $langhash{'dígito verificador incorrecto.'} = 'digit incorrect.'; my $string = '(5/4/07 4:24:03 PM CDT) 1473721: ## dígito verificad +or incorrecto. ##pp'; if( $string =~ s/##\s*([^#]+\.)\s*##/##$langhash{$1}##/ ){ print "|$langhash{$1}|\n"; print "|$1|\n"; print $string; }
    That will do it.

    Regards,

    M.