comment on

Hi everyone.
Here I am with a new problem I can't solve.
I have two input files. One contains a list of semantic relations structured like the following (lets' call it INPUT1):

alligator-n        amphibian_reptile    attri    long-j
alligator-n        amphibian_reptile    attri    old-j
alligator-n        amphibian_reptile    coord    crocodile-n
alligator-n        amphibian_reptile    coord    frog-n
alligator-n        amphibian_reptile    event    walk-v
alligator-n        amphibian_reptile    hyper    animal-n
[download]

And another one that is like the following (obviously the following is just a very reduced version):

frog-n    about    adage-n    8.8016
frog-n    appearance-1    broad-j    11.9640
frog-n    coord    albino-n    6.7667
frog-n    be    jumper-n    6.0272
frog-n    be    key-n    3.8779
frog-n    of    body-n    8.3063
frog-n    of    bone-n    20.7982
frog-n    of    book-n    0.4229
crocodile-n    be    key-n    3.2572
crocodile-n    of    chorus-n    24.9515
crocodile-n    of    book-n    2.3460
crocodile-n    obj    sit-v    3.1857
crocodile-n    obj    size-v    57.3257
crocodile-n    obj    skewer-v    6.1105
animal-n    coord-1    investigation-n    0.9666
animal-n    coord-1    irrigation-n    2.6058
animal-n    coord-1    isolation-n    1.4074
animal-n    coord-1    isotope-n    2.7420
[download]

I need to check input1 for relations eq "coord" (third field of the rows) and search input2 for occurrences of fourth field of the row element in it. In this case I have crocodile-n and frog-n. I have to build another file that looks like input2 but contains every row whose first field is crocodile-n or frog-n. If one element is already found, I need not to repeat it, but sum the score it has with the one I already found.
I understand this explanation is not really clear, so here it is an example of desired output:

not_alligator-n about        adage-n    8.8016
not_alligator-n    appearance-1    broad-j    11.9640
not_alligator-n    coord    albino-n    6.7667
not_alligator-n    be    jumper-n    6.0272
not_alligator-n    be    key-n    7.1351(3.8779+3.2572)
not_alligator-n    of    body-n    8.3063
not_alligator-n    of    chorus-n    24.9515
not_alligator-n    of    bone-n    20.7982
not_alligator-n    of    book-n    2.7689(0.4229+2.3460)
not_alligator-n    obj    sit-v    3.1857
not_alligator-n    obj    size-v    57.3257
not_alligator-n    obj    skewer-v    6.1105
[download]

I have no idea where to start. Less than one month since I started back using perl, and still a lot I have to learn
Every suggestion, tip, indication on what to do would be really appreciated
I need it because I'm analyzing some statistical measure to be used on semantic relation for my ph.D Theses.
Thanks to all
Giulia

In reply to Select only desired features from a text by remluvr

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.