in reply to Replacing Types

Go read How do I post a question effectively? first. Have you? Homework? Hm.

Next, more accurately the sentence should read "my name is <name> and I live in <continent>", 'cause that Bush bloke doesn't live in e.g. Venezuela.

Read up s/// for sub/sti/tute/.

You have a list of patterns, and a list of replacements. To tie them together in a convenient way there's hashes (see perldata). You could write

%hash = ( America => 'country', France => 'country', 'George Bush' => 'name', );

and so on. Writing 'country' over and over is tiresome. It seems easier to key the hash with the replacement patterns and have anonymous arrays as values (see perlref):

%hash = ( country => [ 'USA', 'France', 'Austria', 'Israel'], name => [ 'Georg Bush', 'Marie LePen', 'Jörg Haider', 'Ehud O +lmert'], ... );

Having set up that structure, let's go. Suppose we'll reading from a file, have it open and the filehandle FH ready for reading:

while (defined(my $line = <FH>)) { # read one line into $line while(my($repl,$ary) = each %hash) { # iterate over %hash foreach my $token(@$ary) { # @$ary: dereference the array +in $ary $text =~ s/$token/<$repl>/g; # substitute each token with ha +sh key } } print $line; # done }

This is an example to start with, there are more and probably better ways to do it. Read up the operators s///, m//, y// and tr// in perlop, and perlretut, perlre for more about regular expressions.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Replies are listed 'Best First'.
Re^2: Replacing Types
by Anonymous Monk on Sep 21, 2006 at 19:21 UTC
    I think that your approach is better than regexp. It it the best approach or we can do anything better? I have millions of sentences and want to find common patterns of "informative" sentences this way.
      Any solution is quite fine until it has to scale. But that depends on the dataset and on the goals. You were talking of simple search and replace operations; now it's about finding interesting patterns via search operation through a hugh dataset. This usually requires indexing of tokens / database-like operations / vectorizing terms.

      I begin to suspect an XY Problem... maybe you should use a search engine like Swish-E or Lucene.

      What are you really trying to do?

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}