in reply to Regular expression for finding acronyms
What do you think this should say? Your code does not produce acronyms."L F" "LF" "L.F" "L. F" "HTML" "XML" "X.H.T.M.L" "X. H. T. M. L" "X H T M L"
"An acronym is an initial abbreviation that can be pronounced as a word, such as NASA or WASP. This term is also used to refer to a series of initials pronounced individually, such as FBI or TGIF, but the technical term is initialism."
Update: Perhaps consider this:
Update: Oh, I guess you don't want 'L,' to count...in that case don't delete the comma *see above*#!/usr/bin/perl use strict; use warnings; $|=1; while (my $line = <DATA>) { next if $line =~ /^\s*$/; #skip blank lines $line =~ s/\s*$//; #delete line endings print "INPUT LINE: \'$line\'\n"; $line =~ s/\s+|\,|\.//g; #remove spaces, commas, periods my @acron = $line =~ m/([A-Z]{2,})/g; #get sequence of 2 or #more uppercase chars print "Acronym $_\n" for @acron; } =Prints INPUT LINE: 'L F and LF and L.F. and L. F. and not L, F.' Acronym LF Acronym LF Acronym LF Acronym LF Acronym LF INPUT LINE: 'some HTML some XML.' Acronym HTML Acronym XML INPUT LINE: 'or X.H.T.M.L. or X. H. T. M. L. or even X H T M L' Acronym XHTML Acronym XHTML Acronym XHTML INPUT LINE: 'but not U and I,' INPUT LINE: 'or You and I.' INPUT LINE: '...' =cut __DATA__ L F and LF and L.F. and L. F. and not L, F. some HTML some XML. or X.H.T.M.L. or X. H. T. M. L. or even X H T M L but not U and I, or You and I. ...
LF, HTML, XML, XHTML are acronyms. something like: X. H. T. M. L. is not.
|
|---|