To build a suitable "background" ngram model, it might be good to supplement (or replace) your dictionary with a "corpus" of non-temp-file names. For example, if you take all the file names that include punctuation (e.g. [-_+=. :]), split on punctuation, and count trigrams within chunks of 3 or more alphanumerics, you should have a more "realistic" set of probabilities for trigrams that make up non-temp file names.
Then it's just a matter of assigning a score to each file name in a given list (update: i.e. of file names that have no punctuation), such that names using a lot of improbable trigrams score very low, and those comprising mostly plausible (likely, frequent) trigrams score very high. Sort the list by score (lowest first), and files that come out on top are most likely to be the easiest for human judges to dismiss as obvious temp files.
And then it's just a matter of the judges deciding how far down the list they need to go in order to "finish" (because they've already found enough temp files to free up adequate space, or because they reach a point where there are too few temp files left to bother with).
Of course, I'd be tempted to include file size in the sorting somehow -- deleting bigger temp files first would be a big help. But I don't know how well that would apply to your case.
In reply to Re: Finding Temporary Files
by graff
in thread Finding Temporary Files
by eff_i_g
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |