in reply to find one by one occurances

What have you tried?

What is your code?

Where do you have problems?

Please show a small, self-contained script (maybe 20 lines, no longer than 50 lines) that shows input data, actual output, desired output. Please describe where you encounter problems.

Maybe you want to see perlretut or perlre.

Replies are listed 'Best First'.
Re^2: find one by one occurances
by Selvakumar (Scribe) on Jul 23, 2010 at 06:14 UTC

    Hi Corion,
    I have already given required input and output. See the below code what i am trying to do? I need to identify years in a line and it may come like 2005, 2005a, a2005, May 31, 2005 some thing like that. Here first i am trying to find the occurance.
    I need the below output:
    1. i need to find out the words which has 4 digits number. it's my first requirement.
    2. then i should collect all years and needs to find before/after two words due to full date appearances.

    $var='Hobbs, F. 2005a. Examining American Household Composition: b1990 + and 2000. U.S. Census Bureau, Census 2000 Special Reports, CENSR-24. + I.S. Government Printing Office, Washington, DC.'; #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge; $var=~s/([0-9]{4})/&identify_year($1)/ge; sub identify_year { my ($input)=@_; print "$input\n"; return ($input); }

      Why do you try s/.../identify_year($1)/ge? What is that supposed to do? I thought your objective was to identify a year and the surrounding words?

      If you want to know whether there is one or more occurrence of a regular expression, you can use the following idiom:

      my $var = "This is the year 2000."; my @matches = ($var =~ /([0-9]{4})/g);

      The regular expression I gave will only find four digits. You will need to modify that regular expression to also recognize two words before that year and two words after that year.

        that's the question i was raised here. How to modify the regular expression to find consecutive 5 words which the center word has year. if i use some thing like below

        #$var=~s/(\w?) (\w?) ([0-9]{4}+[a-zA-Z]?) (\w?) (\w?)/&identify_year($ +1.$2.$3.$4.$5)/ge;

        it's not giving my required output and it fails here. "b1990 and 2000." U.S. etc., if u try. read my all questions clearly the year can occurs in any format not only 4 digits.