Hello I want to put diacritic into 30mb text file "input". I have a file "cetnosti" with most used words looking like this
a 14458708 se 10848091 v 10688846 na 8721120 je 5353514 že 4991304
in left column its a word and in right column number of occurencies in some different text of same language. i have file "input" looking like this
Je mi urcite cti, avsak predstavit pomerne strucne, a navic bez moznos praktickych ukazek, nas hlavni a nosny produkt, muze zpusobit I male komplikace. Proto prijmete prosim tento clanek jako snahu, poskytnout
and i need to replace the words in "input" with words with highest occrencies from "cetnosti" and write it into output. the problem is that file cetnosti is too big to read it all into memory so i read only beginning from it with most used words
use Tree::Trie; $trie = new Tree::Trie; $filei = "cetnosti"; $filer = "input"; $filew = "output"; open(INFO, $filei) || die "error: couldnt open file: $!"; $lineno=1; while ((defined ($line = <INFO>)) && ($lineno < 100000)) { $line =~ s/\t.*//g; $line =~ s/\n//g; $trie->add($line); $lineno++; } close(INFO);
now i was thinking how to replace words from input and write them into output
open(READ, $filer) || die "error: couldnt open file: $!"; open(WRITE, "> $filew") || die "error: couldnt open file: $!"; while (defined ($line = <READ>)) { @words = split(/ /, $line); #here id like to compare word from each line of "input" with "cetnosti +" and write it to "output" but i have no idea how to do it } close(READ); close(WRITE);
anybody can think about some good way? thank you for help

In reply to fill diacritic into text by jajaja

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.