dra2pac has asked for the wisdom of the Perl Monks concerning the following question:

Hi, guys. I have two files. One is in this format:

I_pron
would_mod
like_v
to_to
learn_v
this_pron
._sent
OK_ok
?_quest

and the second one is in this format:

I would like to learn this .
OK ?

What I have been trying to do (unsuccessfully though) is this: combining the two files so that the output would look like this:

I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest

So far I have come up with the following script:

$DOC1="1.txt"; $DOC2="2.txt"; open (DOC1,$DOC1); @words1=<DOC1>; close (DOC1); open (DOC2,$DOC2); @lines2=<DOC2>; close (DOC2); foreach $line2(@lines2) { @words2=split / /, $line2; foreach $word2(@words2) { s/$word2/$words1[0]/; splice (@words1, 0, 1); } print @words2; }
The result shows me I am no way near what I need. Thank you in advance for any suggestions.

Replies are listed 'Best First'.
Re: replacing while keeping the position of the element in the file
by Roy Johnson (Monsignor) on Jul 27, 2004 at 14:17 UTC
    It's not clear to me how, if you want to maintain the word lists strictly in parallel, your desired output isn't just @words1.

    I think you might want something like this:

    my %xlate; # open DOC1 here while (<DOC1>) { chomp; my($word_part) = split /_/; $xlate{$word_part} = $_; } # open DOC2 here while (<DOC2>) { s/(\S+)/$xlate{$1}/ge; print; }
    Though that won't strip any newlines. Doing so isn't difficult.

    We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: replacing while keeping the position of the element in the file
by Jasper (Chaplain) on Jul 27, 2004 at 14:48 UTC
    I think what you want to do is split the first file into a hash of what each word is coded to:
    my %sentence_codes = map { /(.*)_(.*)/ && ($1,$2) } split $codefile;
    Then just do a global replace on the second file:
    $sentence_file =~ s/(\w['-]\w?)|[?:;]/$&.'_'.($sentence_codes{$&} || ' +unknown')/eg;
    I've no idea if this is what you wanted, but that's what I read into it. (This is all pseudo-code, really - you'd need to map the first file one line at a time)
Re: replacing while keeping the position of the element in the file
by TrekNoid (Pilgrim) on Jul 27, 2004 at 16:16 UTC
    File one is:
    I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest
    and the second one is in this format:
    I would like to learn this . OK ?
    Okay... let me be sure I understand what you're asking...

    I'm guessing that your goal is to parse file two, and match each word against a lookup file (file one, which is in word_article format) and output the lookup file version of file two.

    So, a more generalized way of asking is: Take a file/sentence, and lookup each token of the file/sentence and output the look-up version of the tokens.

    In other words, an instant Perl sentence diagrammer :)

    Assuming that's true, here's what I come up with (keeping the basic structure of your code intact... and fully admitting I'm not the uber coder some here are):

    $DOC1="1.txt"; $DOC2="2.txt"; # Create an associative lookup array from file 1 open (DOC1,$DOC1); while (<DOC1>) { chomp($_); $ind = (split /_/)[0]; $lookup{$ind} = $_; } close (DOC1); open (DOC2,$DOC2); @lines=<DOC2>; close (DOC2); $outln = ''; foreach $line (@lines) { @words = split(/ /, $line); foreach $word (@words) { chomp($word); # Get rid of stray carriage return $outln = $outln . " " . $lookup{$word}; } } print "$outln\n";
    That close to what you're after?

    Trek

Re: replacing while keeping the position of the element in the file
by husker (Chaplain) on Jul 27, 2004 at 14:26 UTC
    I don't see what purpose the second file has. It looks like you just want to remove the newlines from the first file.
Re: replacing while keeping the position of the element in the file
by Art_XIV (Hermit) on Jul 27, 2004 at 15:13 UTC
    use strict; use warnings; #emulate reading from 1st file my %lookup = (); while (<DATA>) { chomp; my $key = (split /_/)[0]; $lookup{$key} = $_; } #emulate 2nd file my @input = ("I would like to learn this ! .", "OK ?"); foreach my $line (@input) { my @tokens = map {$lookup{$_} || 'unknown'} split /\s+/, $line; print "@tokens\n"; } 1; __DATA__ I_pron would_mod like_v to_to learn_v this_pron ._sent OK_ok ?_quest
    Hanlon's Razor - "Never attribute to malice that which can be adequately explained by stupidity"