in reply to regexp not greedy
XML would be worth looking into for this, or even just labeled parens, like "(STAN con la certeza absoluta) de .. que (VERB_COMPLEX no hay-e+) (SUBJ nadie) (LOC:ST en la casa)" -- anything like this would make the data easier to process, and less prone to simple mistakes that might cause catastrophic damage.
(If your goal is to transform the data into some better format, this is an excellent idea, and I wish you the best of luck.)
|
|---|