Parsing html is always more problematic than it first seems, and rrwo is quite right to suggest that HTML::Parser is going to save you headaches later, but that's probably overkill for now.
The problem is that there isn't a single regex that will remove html. The most commonly used construction is:
s/<[^>]+>//gs
But that fails because this is valid html, even though the >s could as easily have been entities:
<input type="submit" value="go >>>">
The best answer seems to be HTML::FormatText, which will strip html and optionally wrap the output for you. If there are particular tags that you want to keep, then the quickest thing to do is probably a set of simple regexes that replace each one with something innocuous before the text is parsed and then replace it back again afterwards.
Once the html removal is taken care of, truncation should be simple:
my $text = 'The world is all that is the case';
my $sniplength = 10;
my $truncated = substr($text,0,index($text,' ',$sniplength));
muffled update from within paper bag: s/should be/could as easily have been/ |