replacing while keeping the position of the element in the file

dra2pac has asked for the wisdom of the Perl Monks concerning the following question:

Hi, guys. I have two files. One is in this format:

I_pron
would_mod
like_v
to_to
learn_v
this_pron
._sent
OK_ok
?_quest

and the second one is in this format:

I would like to learn this .
OK ?

What I have been trying to do (unsuccessfully though) is this: combining the two files so that the output would look like this:

I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest

So far I have come up with the following script:

$DOC1="1.txt";
$DOC2="2.txt";

open (DOC1,$DOC1);
@words1=<DOC1>;
close (DOC1);

open (DOC2,$DOC2);
@lines2=<DOC2>;
close (DOC2);

foreach $line2(@lines2) {
 @words2=split / /, $line2;
  foreach $word2(@words2) {
   s/$word2/$words1[0]/;
   splice (@words1, 0, 1);
  }
 print @words2;
}
[download]

The result shows me I am no way near what I need. Thank you in advance for any suggestions.

Comment on replacing while keeping the position of the element in the file Download Code

Replies are listed 'Best First'.
Re: replacing while keeping the position of the element in the file by Roy Johnson (Monsignor) on Jul 27, 2004 at 14:17 UTC
It's not clear to me how, if you want to maintain the word lists strictly in parallel, your desired output isn't just `@words1`. I think you might want something like this: `my %xlate; # open DOC1 here while (<DOC1>) { chomp; my($word_part) = split /_/; $xlate{$word_part} = $_; } # open DOC2 here while (<DOC2>) { s/(\S+)/$xlate{$1}/ge; print; }` [download] Though that won't strip any newlines. Doing so isn't difficult. We're not really tightening our belts, it just feels that way because we're getting fatter.	[reply] [d/l] [select]
Re: replacing while keeping the position of the element in the file by Jasper (Chaplain) on Jul 27, 2004 at 14:48 UTC
I think what you want to do is split the first file into a hash of what each word is coded to: `my %sentence_codes = map { /(.)_(.)/ && ($1,$2) } split $codefile;` [download] Then just do a global replace on the second file: `$sentence_file =~ s/(\w['-]\w?)\|[?:;]/$&.'_'.($sentence_codes{$&} \|\| ' +unknown')/eg;` [download] I've no idea if this is what you wanted, but that's what I read into it. (This is all pseudo-code, really - you'd need to map the first file one line at a time)	[reply] [d/l] [select]
Re: replacing while keeping the position of the element in the file by TrekNoid (Pilgrim) on Jul 27, 2004 at 16:16 UTC
File one is: `I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest` [download] and the second one is in this format: `I would like to learn this . OK ?` [download] Okay... let me be sure I understand what you're asking... I'm guessing that your goal is to parse file two, and match each word against a lookup file (file one, which is in word_article format) and output the lookup file version of file two. So, a more generalized way of asking is: Take a file/sentence, and lookup each token of the file/sentence and output the look-up version of the tokens. In other words, an instant Perl sentence diagrammer :) Assuming that's true, here's what I come up with (keeping the basic structure of your code intact... and fully admitting I'm not the uber coder some here are): `$DOC1="1.txt"; $DOC2="2.txt"; # Create an associative lookup array from file 1 open (DOC1,$DOC1); while (<DOC1>) { chomp($_); $ind = (split /_/)[0]; $lookup{$ind} = $_; } close (DOC1); open (DOC2,$DOC2); @lines=<DOC2>; close (DOC2); $outln = ''; foreach $line (@lines) { @words = split(/ /, $line); foreach $word (@words) { chomp($word); # Get rid of stray carriage return $outln = $outln . " " . $lookup{$word}; } } print "$outln\n";` [download] That close to what you're after? Trek	[reply] [d/l] [select]
Re: replacing while keeping the position of the element in the file by husker (Chaplain) on Jul 27, 2004 at 14:26 UTC
I don't see what purpose the second file has. It looks like you just want to remove the newlines from the first file.	[reply]
Re: replacing while keeping the position of the element in the file by Art_XIV (Hermit) on Jul 27, 2004 at 15:13 UTC
`use strict; use warnings; #emulate reading from 1st file my %lookup = (); while (<DATA>) { chomp; my $key = (split /_/)[0]; $lookup{$key} = $_; } #emulate 2nd file my @input = ("I would like to learn this ! .", "OK ?"); foreach my $line (@input) { my @tokens = map {$lookup{$_} \|\| 'unknown'} split /\s+/, $line; print "@tokens\n"; } 1; __DATA__ I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest` [download] Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"	[reply] [d/l]