in reply to Re: find one by one occurances
in thread find one by one occurances

Hi Corion,
I have already given required input and output. See the below code what i am trying to do? I need to identify years in a line and it may come like 2005, 2005a, a2005, May 31, 2005 some thing like that. Here first i am trying to find the occurance.
I need the below output:
1. i need to find out the words which has 4 digits number. it's my first requirement.
2. then i should collect all years and needs to find before/after two words due to full date appearances.

$var='Hobbs, F. 2005a. Examining American Household Composition: b1990 + and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24. + I.S. Government Printing Office, Washington, DC.'; #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge; $var=~s/([0-9]{4})/&identify_year($1)/ge; sub identify_year { my ($input)=@_; print "$input\n"; return ($input); }

Replies are listed 'Best First'.
Re^3: find one by one occurances
by Corion (Patriarch) on Jul 23, 2010 at 06:26 UTC

    Why do you try s/.../identify_year($1)/ge? What is that supposed to do? I thought your objective was to identify a year and the surrounding words?

    If you want to know whether there is one or more occurrence of a regular expression, you can use the following idiom:

    my $var = "This is the year 2000."; my @matches = ($var =~ /([0-9]{4})/g);

    The regular expression I gave will only find four digits. You will need to modify that regular expression to also recognize two words before that year and two words after that year.

      that's the question i was raised here. How to modify the regular expression to find consecutive 5 words which the center word has year. if i use some thing like below

      #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge;

      it's not giving my required output and it fails here. "b1990 and 2000." U.S. etc., if u try. read my all questions clearly the year can occurs in any format not only 4 digits.

        That is what you have to do. See perlretut and/or perlre.

        This is not a code writing service where we will write code for you.

        Maybe you might be interested in that /\w?/ will only match zero or one "word character". If you want to match one or more, you might want to use \w+. Again, now is a good time to learn about regular expressions.