comment on

It's a good thing you provided this information:

ps : this script is supposed to get all the lines from a file and refer each word to the sentence that starts with that word (and, there aren't 2 sentences that start identically).

Without that, there'd be no hope of helping with the problem. But even with that, there's still not quite enough to go on. (Looks like perldeveloper made a lucky guess, but I confess that I am still confused.)

Does the input data file really contain exactly one "sentence" per line? Are you certain that the "words" in each sentence are always separated by exactly a single space character? Are the words in "mixed case", and do they include punctuation marks? (And does this have an effect on what you are trying to do?) Why should it matter if a sentence contains a "word" that consists of the single letter "v"?

Let's suppose a particular word (e.g. "bar") occurs at the beginning of one sentence (e.g. sentence #23), and also occurs in the middle or at the end of 4 other sentences (e.g. #5, #12, #47, #69). What do you want to accomplish with regard to this word? Locate just the one sentence that begins with "bar"? Locate just the other four sentences that contain "bar"? Locate all five sentences (and identify the one that begins with "bar")? What do you want to do with words that only occur in the middle or at the end of sentences but never at the beginning of any sentence? Ignore them?

How you answer those questions will determine how you should read through the sentences and words, what sort of data structure you should create from the input data, and how you would use that data structure after you've built it.

As for the code you posted at the start of this thread, the reason it takes so long for more sentences is the nesting of your "for" loops:

  foreach sentence in the file {
    ...
    foreach word in the sentence {
      ...
      foreach sentence in the file {
        ... # given n sentences with and avereage of m words each,
            # this block has to execute n*m*n times
      }
    }
  }
[download]

As you have learned from experience, this sort of approach "does not scale well" to large numbers of sentences. But to work out a good approach, you need to clarify your goals. You seem to be content with perldeveloper's solution (assuming his additional reply makes sense to you), but it's not clear to me that it is the best approach, or that it does what you really want -- mostly because you haven't provided a clear description of what you really want.

In reply to Re: makeing refering faster ? by graff
in thread makeing refering faster ? by chiburashka

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.