comment on

Hello I want to put diacritic into 30mb text file "input". I have a file "cetnosti" with most used words looking like this

a       14458708
se      10848091
v       10688846
na      8721120
je      5353514
&#382;e      4991304
[download]

in left column its a word and in right column number of occurencies in some different text of same language. i have file "input" looking like this

Je mi urcite cti, avsak predstavit pomerne strucne, a navic bez moznos
praktickych ukazek, nas hlavni a nosny produkt, muze zpusobit I male
komplikace. Proto prijmete prosim tento clanek jako snahu, poskytnout
[download]

and i need to replace the words in "input" with words with highest occrencies from "cetnosti" and write it into output. the problem is that file cetnosti is too big to read it all into memory so i read only beginning from it with most used words

use Tree::Trie;

$trie = new Tree::Trie;
$filei = "cetnosti";
$filer = "input";
$filew = "output";

open(INFO, $filei) || die "error: couldnt open file: $!";
$lineno=1;
while ((defined ($line = <INFO>)) && ($lineno < 100000))
{
$line =~ s/\t.*//g;
$line =~ s/\n//g;
$trie->add($line);
$lineno++;
}
close(INFO);
[download]

now i was thinking how to replace words from input and write them into output

open(READ, $filer) || die "error: couldnt open file: $!";
open(WRITE, "> $filew") || die "error: couldnt open file: $!";

while (defined ($line = <READ>))
{
@words = split(/ /, $line);

#here id like to compare word from each line of "input" with "cetnosti
+" and write it to "output" but i have no idea how to do it

}
close(READ);
close(WRITE);
[download]

anybody can think about some good way? thank you for help

In reply to fill diacritic into text by jajaja

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.