in reply to Re: Comparing a list to a tab delimited text file
in thread Comparing a list to a tab delimited text file
I apologise for the delay, and hope you're still willing to help me.
I understand you need more details about my script and resources, here they are :
- 8Go RAM (that is a problem for me now, as I tend to write quite greedy code, I'm getting out of memory errors)
- a 70Ko XML file, almost 2M lines long, each line corresponding to one word / node, like this :
- a 10 Ko tabulation separated text file, 150k lines long, looking like this :<DocumentSet> <document> <w lemma="appeler" type="VER:pres">appelle</w> <w lemma="quand" type="KON">quand</w> <w lemma="gronder" type="VER:infi">gronder</w> </document> </DocumentSet>
I'm using the XML::Twig module to go through the XML tree and modify nodes. I use a foreach $w instruction to loop through each <w> node and then check if its content matches a word from the first column of the tab document. If so, I want to add some attributes from the other columns to the XML node for a result like this :tunisiennes tynizjEn tunisien ADJ f p 0,3 3,51 + 0 0,2 0,2 undef remplît R@pli remplir VER undef undef 61,21 81,42 + 0 0,2 0,2 "sub:imp:3s;" remuons R°my§ remuer VER undef undef 24,42 62,84 + 0,2 0 0,2 "imp:pre:1p;ind:pre:1p;" remuât R°m8a remuer VER undef undef 24,42 62,84 + 0 0,2 0,2 "sub:imp:3s;" renaudant R°nod@ renauder VER undef undef 0 2,64 + 0 0,2 0,2 "par:pre;" ébouriffées ebuRife ébouriffé ADJ f p 0,22 3,45 + 0 0,2 0,2 undef rendissent R@dis rendre VER undef undef 508,81 46 +8,11 0 0,2 0,2 "sub:imp:3p;"
Ask me for more info if needed.<w conjugaison="imp:pre:2s;ind:pre:1s;ind:pre:3s;sub:pre:1s;sub:pre:3s +;" genre="" lemma="appeler" nombre="" type="VER:pres">appelle</w> <w genre="" lemma="quand" nombre="" type="KON">quand</w> <w conjugaison="inf;" genre="" lemma="gronder" nombre="" type="VER +:infi">gronder</w>
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Comparing a list to a tab delimited text file
by Laurent_R (Canon) on Mar 15, 2018 at 17:03 UTC | |
by Azaghal (Novice) on Mar 19, 2018 at 10:22 UTC |