Re: Replacing Types

Go read How do I post a question effectively? first. Have you? Homework? Hm.

Next, more accurately the sentence should read "my name is <name> and I live in <continent>", 'cause that Bush bloke doesn't live in e.g. Venezuela.

Read up s/// for sub/sti/tute/.

You have a list of patterns, and a list of replacements. To tie them together in a convenient way there's hashes (see perldata). You could write

%hash = (
    America       => 'country',
    France        => 'country',
    'George Bush' => 'name',
);
[download]

and so on. Writing 'country' over and over is tiresome. It seems easier to key the hash with the replacement patterns and have anonymous arrays as values (see perlref):

%hash = (
    country   => [ 'USA', 'France', 'Austria', 'Israel'],
    name      => [ 'Georg Bush', 'Marie LePen', 'Jörg Haider', 'Ehud O
+lmert'], 
    ...
);
[download]

Having set up that structure, let's go. Suppose we'll reading from a file, have it open and the filehandle FH ready for reading:

while (defined(my $line = <FH>)) {     # read one line into $line
  while(my($repl,$ary) = each %hash) { # iterate over %hash
    foreach my $token(@$ary) {         # @$ary: dereference the array 
+in $ary
      $text =~ s/$token/<$repl>/g;     # substitute each token with ha
+sh key
    }
  }
  print $line;                         # done
}
[download]

This is an example to start with, there are more and probably better ways to do it. Read up the operators s///, m//, y// and tr// in perlop, and perlretut, perlre for more about regular expressions.

--shmem

_($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                              /\_¯/(q    /
----------------------------  \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

Comment on Re: Replacing Types Select or Download Code

Replies are listed 'Best First'.
Re^2: Replacing Types by Anonymous Monk on Sep 21, 2006 at 19:21 UTC
I think that your approach is better than regexp. It it the best approach or we can do anything better? I have millions of sentences and want to find common patterns of "informative" sentences this way.	[reply]
Re^3: Replacing Types by shmem (Chancellor) on Sep 21, 2006 at 20:38 UTC
Any solution is quite fine until it has to scale. But that depends on the dataset and on the goals. You were talking of simple search and replace operations; now it's about finding interesting patterns via search operation through a hugh dataset. This usually requires indexing of tokens / database-like operations / vectorizing terms. I begin to suspect an XY Problem... maybe you should use a search engine like Swish-E or Lucene. What are you really trying to do? --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply]