in reply to Re^2: merging a file with a value present in another file
in thread merging a file with a value present in another file

Please try again: as frozenwithjoy said, give us a sample of file1, a sample of file2, and a sample of what you want the output to be. Wrap each of these three samples in <code></code> tags to preserve their formatting. By a "sample," I mean a few lines that are enough to demonstrate your problem. One line is rarely enough; more than a dozen is usually too many.

Aaron B.
Available for small or large Perl jobs; see my home node.

Replies are listed 'Best First'.
Re^4: merging a file with a value present in another file
by lakssreedhar (Acolyte) on Jul 16, 2012 at 07:00 UTC

    file1 is

    {RP}makaravilYakkin Sabarimala ayyappanu cArZwwAnulYlYa wiruviwAMkUrZ rAjAvAyirunna SrI ciwwirawirunnAlYZ bAlarAmavarZmma natakk vacca  420 kilogrAM wUkkamulYlYa wafkayafki{/RP}{MCL} sUkRikkunnaw I kRewrawwilAN.{/MCL}

    file2 is

    <Sentence id="1"> 1 (( NP 1.1 makaravilYakkin NN <fs af='makaravilYakk,n,any,sg,,d,,kk' + conj="blank" spec="blank" CASE_NAME="dat" dubi="blank"> )) 2 (( NP 2.1 Sabarimala NNP <fs af='Sabarimala,n,any,sg,,d,,0' conj="b +lank" spec="blank" CASE_NAME="nom" dubi="blank"> 2.2 ayyappanu NN <fs af='ayyappanu,unkn,,,,,,' poslcat="NM"> )) 3 (( VGF 3.1 cArZwwAnulYlYa VM <fs af='cArZww,v,any,any,any,,AnulYlYa, +AnulYlYa'> )) 4 (( NP 4.1 wiruviwAMkUrZ QF <fs af='wiruviwAMkUrZ,n,any,sg,,d,,0' co +nj="blank" spec="blank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 4.2 rAjAvAyirunna NN <fs af='rAjAv,n,m,sg,,o,,yAyirunna' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> )) 5 (( NP 5.1 SrI UNK <fs af='SrI,n,any,sg,,d,,0' conj="blank" spec="bl +ank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 5.2 ciwwirawirunnAlYZ NN <fs af='ciwwirawirunnAlYZ,unkn,,,,,, +' poslcat="NM"> 5.3 bAlarAmavarZmma NNP <fs af='bAlarAmavarZmma,unkn,,,,,,' p +oslcat="NM"> 5.4 natakk NN <fs af='nata,n,any,sg,,d,,kk' conj="blank" spec +="blank" CASE_NAME="dat" dubi="blank"> )) 6 (( VGF 6.1 vacca VM <fs af='vaykk,v,any,any,any,,ta,ta' CASE_NAME="n +om"> )) 7 (( NP 7.1 420 QC <fs af='420,num,,,,,,'> 7.2 kilogrAM NN <fs af='kilogrAM,unkn,,,,,,' poslcat="NM"> )) 8 (( NP 8.1 wUkkamulYlYa NN <fs af='wUkkaM,n,any,sg,,d,,yulYlYa' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 8.2 wafkayafki NNP <fs af='wafkayafki,unkn,,,,,,' poslcat="NM +"> )) 9 (( VGNF 9.1 sUkRikkunnaw VM <fs af='sUkRikk,v,any,any,any,,unnaw,unna +w'> )) 10 (( NP 10.1 I DEM <fs af='I,pn,any,sg,,,,0' conj="blank" spec="blank +" CASE_NAME="nom" dubi="blank"> 10.2 kRewrawwilAN NN <fs af='kRewraM,n,any,sg,,d,,yilAN' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 10.3 . SYM <fs af='.,punc,,,,,,' poslcat="NM"> )) </Sentence>

    my output file should be

    <Sentence id="1"> 1 (( NP 1.1 makaravilYakkin NN <fs af='makaravilYakk,n,any,sg,,d,,kk' + conj="blank" spec="blank" CASE_NAME="dat" dubi="blank" clause_start= +"rp"> )) 2 (( NP 2.1 Sabarimala NNP <fs af='Sabarimala,n,any,sg,,d,,0' conj="b +lank" spec="blank" CASE_NAME="nom" dubi="blank"> 2.2 ayyappanu NN <fs af='ayyappanu,unkn,,,,,,' poslcat="NM"> )) 3 (( VGF 3.1 cArZwwAnulYlYa VM <fs af='cArZww,v,any,any,any,,AnulYlYa, +AnulYlYa'> )) 4 (( NP 4.1 wiruviwAMkUrZ QF <fs af='wiruviwAMkUrZ,n,any,sg,,d,,0' co +nj="blank" spec="blank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 4.2 rAjAvAyirunna NN <fs af='rAjAv,n,m,sg,,o,,yAyirunna' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> )) 5 (( NP 5.1 SrI UNK <fs af='SrI,n,any,sg,,d,,0' conj="blank" spec="bl +ank" CASE_NAME="nom" dubi="blank" poslcat="NM"> 5.2 ciwwirawirunnAlYZ NN <fs af='ciwwirawirunnAlYZ,unkn,,,,,, +' poslcat="NM"> 5.3 bAlarAmavarZmma NNP <fs af='bAlarAmavarZmma,unkn,,,,,,' p +oslcat="NM"> 5.4 natakk NN <fs af='nata,n,any,sg,,d,,kk' conj="blank" spec +="blank" CASE_NAME="dat" dubi="blank"> )) 6 (( VGF 6.1 vacca VM <fs af='vaykk,v,any,any,any,,ta,ta' CASE_NAME="n +om"> )) 7 (( NP 7.1 420 QC <fs af='420,num,,,,,,'> 7.2 kilogrAM NN <fs af='kilogrAM,unkn,,,,,,' poslcat="NM"> )) 8 (( NP 8.1 wUkkamulYlYa NN <fs af='wUkkaM,n,any,sg,,d,,yulYlYa' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 8.2 wafkayafki NNP <fs af='wafkayafki,unkn,,,,,,' poslcat="NM +" clause_end="rp"> )) 9 (( VGNF 9.1 sUkRikkunnaw VM <fs af='sUkRikk,v,any,any,any,,unnaw,unna +w'> )) 10 (( NP 10.1 I DEM <fs af='I,pn,any,sg,,,,0' conj="blank" spec="blank +" CASE_NAME="nom" dubi="blank"> 10.2 kRewrawwilAN NN <fs af='kRewraM,n,any,sg,,d,,yilAN' conj +="blank" spec="blank" CASE_NAME="nom" dubi="blank"> 10.3 . SYM <fs af='.,punc,,,,,,' poslcat="NM"> )) </Sentence>

      I'm not sure why the MCL clause doesn't show up in your output sample. But I'd say you have a two-step process, possibly involving two hashes:

      1. Go through file1, parsing out the beginning and end word in each clause, putting them in a %start hash and an %end hash respectively, with the tag (RP, MCL) as the keys' values.
      2. Go through file2, checking the first word of each line to see if it exists in one of these hashes, and if so, add the appropriate tag to the end of the line.

      The rest is just implementation.

      Aaron B.
      Available for small or large Perl jobs; see my home node.

        I want the MCL tag also.I am new to perl,is it using regular expression that i can parse the start and end of each clause.Also i want the program to run for many such files.