comment on

Thanks a lot, but I don't understand the line my @words = $_ =~ m/\b[A-Z]+\b/g;

I used regular expressions here to capture all consecutively appearing upper case alphabets separated by word boundaries (\b). Please check the perl regular expressions documentation

You don't use split ?

You could use split if you like but i think it is better to split by \W+ (non-word character) rather than \s+. This helps keep pattern matching simple in the next step. For the sample text below, using \s+ instead of \W+ would find none unless we perform a more complicated pattern matching later.

my %acronyms;

my $text= "An important class of transcription factors called general 
+transcription factors (GTF) are necessary for transcription to occur.
+ The Most common GTF's are TFIIA, TFIIB, and TFIIE.And a few more not
+ mentioned here.";

my @words = split('\W+', $text);

foreach(@words) {
       if($_ =~ m/^[A-Z]+$/){ $acronyms{$_}++; }
}
foreach( keys(%acronyms) ){ print "$_ seen $acronyms{$_} times\n"; }
[download]

In reply to Re^3: find acronyms in a text by arun_kom
in thread find acronyms in a text by steph_bow

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Syntactic Confectionery Delight
	PerlMonks