Re: Suggestions re parsing one large file with elements of another large file

Hm... why do you want to go high-tech, when low-tech will do it for you? Just read the username file in line by line and add the names to a hash. Why a hash? Because you can lookup hash entries for existance.

while (<FH>)
{
    next if (/^#/);
    chomp;
    if (not exists $usernames{$_})
    {
        $usernames{$_} = 1;
    }
    else
    {
        die "found username duplicate in first file, script halted!\n"
+;
    }
}
[download]

Now you can open the other file read it line by line (with or without array buffering it), parsing each line into it's words and then compare them with the usernames hash (exists $hash{$word}).

If no match came up, write the line to an output file. If you found a match go on searching until you find the next match, then start from the beginning (of this paragraph, not the file). :)

/oliver/

Hans: Not ze problem!

Comment on Re: Suggestions re parsing one large file with elements of another large file Select or Download Code

Replies are listed 'Best First'.
Re: Re: Suggestions re parsing one large file with elements of another large file by billie_t (Sexton) on Jan 13, 2004 at 06:42 UTC
Thanks for the suggestions, guys. It's true that there probably is enough memory to read both of them at once. I can open both files into poxy notepad, so it must fit into memory. And I'll see how far I can get with your suggestion oliver, at least it has syntax I can understand. The big problem with outputting non-matches into another file is that there is a variable number of lines between matching the <username> the first time and the next(last) time - all of these lines need to be omitted. I suppose you match once, read all subsequent lines into a dummy hash until the next match, where you start reading lines into your output hash again. I can see the concept, but not quite how to execute it... Thanks again for the food for thought	[reply]

Replies are listed 'Best First'.

Re: Re: Suggestions re parsing one large file with elements of another large file
by billie_t (Sexton) on Jan 13, 2004 at 06:42 UTC

Thanks for the suggestions, guys. It's true that there probably is enough memory to read both of them at once. I can open both files into poxy notepad, so it must fit into memory.

And I'll see how far I can get with your suggestion oliver, at least it has syntax I can understand. The big problem with outputting non-matches into another file is that there is a variable number of lines between matching the <username> the first time and the next(last) time - all of these lines need to be omitted. I suppose you match once, read all subsequent lines into a dummy hash until the next match, where you start reading lines into your output hash again. I can see the concept, but not quite how to execute it...

Thanks again for the food for thought

[reply]