I have a large mass of files (mostly PDF) that I want to reorganize based on keyword frequency.
As a first step, I obviously need to analyze each file. Basically, I'd like to so something akin to what in SAS is called a "PROC FREQ" on each file and extract a list of the top 10 keywords for each. Those stats would be plugged into a spreadsheet, which would be parsed separately eventually.
Any suggestions on links on how to get started on this?
Thanks in advance.