It is usually a bad practice to parse HTML with a regex. It appears that you are extracting pure text (no markup) from one tag everywhere it occurs. Your regex approach is fine as long as that text is always all on one line. I do not see any problem with the way you expand contractions. Now you ask about removing 'stop words'. I have never heard this term, but I assume that you have a list of words (stored in the file stopwords.txt) which you want to filter out of your text. There are several ways you might do this. You have not given us enough information to make a recommendation. How long is your quotes.txt file? What exactly is a 'word'? How many words are in stopwords.txt? How is that file organized? One word per line?. What have you tried and how did it fail?
Bill

In reply to Re: How Could I Remove Stopwords using stopwords.txt? by BillKSmith
in thread How Could I Remove Stopwords using stopwords.txt? by mynameisbob

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.