in reply to Regex matching

First of all, thanks for posting your code. It's good to see you're trying (and trying to learn). And good on you for use-ing strict and warnings.

So, I'm back asking your help.

Not a problem - "everyone is a newbie at something".

I hope I don't make too many mistakes posting this.

I guess you're referring to Re: Perl starter with big problem.. Don't take that personally; it's a criticism of your post title, not you as a person. Do take the criticism on board (i.e. learn from it) - parv had a good reason for saying it: namely, the Monastery is full of very skilled people with many demands on their time. If you make a post, the only way they can judge whether the post aligns with their interest and expertise is by the title. A vague title gives them nothing to go on - every other post in Seekers of Perl Wisdom is by a "Perl starter with a big problem". A better title would have been How can I replace TAB chars with '/'.

(Another reason for not using vague titles is that it makes it hard for others to search the Monastery for solutions to the same problem you were having.)

It's still hard to figure out exactly what you're after. Can you please post (some of) the contents of the file 'Output' after the script runs, and tell us what's wrong with it?

Also, a few random points about your code:

HTH,
mtp


Indicators of geekdom:
  • You get a kick out of finishing sentences with domain names, because of the syntactic overlap of certain written languages and DNS notation.
  • You understood the previous sentence.

Replies are listed 'Best First'.
Re^2: Regex matching
by b_vulnerability (Novice) on Nov 04, 2008 at 08:49 UTC
    I'm very happy to learn. I have not taken what parv said before badly. I know the title was wrong and I understand the reason, because I've spent a lot of time serching archives of this site for coming up with a solution to my problem.
    I've serached a lot and found this: How do I extract all text between two keywords like start and end? which is exactly what I need to do. The only problem is: I need to extract only sentences that have just three or four word between said keywords, and not match the other.
    Let's say that my keywords are atomo and nucleo. I'd like to extract just this:
    atomo/atomo/S-MS  Ë/essereV-S3IP  composto/comporre/V-MSPR  da/da/E  un/un/RIMS  nucleo/nucleo/S
    but not this:
     atomo/atomo/S-MS  non/non/B  era/essereV-S3II  indivisibile/indivisibile/A-NS  ,/,/PU  bensÏ/bensÏ/C  a/a/E  sua/suo/A-FS  volta/volta/S-FS  composto/comporre/V-MSPR  da/da/E  particelle/particella/S-FP  pi˘/pi˘/B  piccole/piccolo/A-FP  (/(/PU  alle/a/E-FP  quali/quale/P-NP  ci/ci/PQNP  si/si/PQNN  riferisce/riferireV-S3IP  con/con/E  il/il/RDMS  termine/termine/S-MS  "/"/PU  subatomiche/subatomico/A-FP  "/"/PU  )/)/PU  ././PU  In/in/E  particolare/particolare/S-MS  ,/,/PU  l'/lo/RDNS /atomo/atomo/S-MS  Ë/essereV-S3IP  composto/comporre/V-MSPR  da/da/E  un/un/RIMS  nucleo/nucleo/S.
    I managed (more or less) to do what is suggested in the link I posted before, but I don't know how to tell the regex that I don't want every single sentence that start with a keyword and end with the other, but just sentences that have three or four words between the first and the second.
    I really don't know how to explain this in other words, and I'm sorry if my examples aren't clear enough. I'm a newbie and I'm italian (so my english is far from perfect). Thanks again to everyone.
      but I don't know how to tell the regex that I don't want every single sentence that start with a keyword and end with the other, but just sentences that have three or four words between the first and the second.

      To get that result, you could change
      while ($text =~ / $key (.*)? $value /g)

      to

      while ($text =~ /\s$key\s+((?:\S+\s+){0,3}\S+)\s+$value\s/g)

      This captures 1 to 4 words. (Do you really want space before $key and after $value?)

      A few questions more.

      • Can you match the same key and value more than once on the same string?
      • Can more than one set of key/value be matched on the same string?
      I ask because that is what your code is doing now in the section
      while (my $text=<$testo>){ for my $key (keys %hash){ my $value = $hash{$key}; while ($text =~ / $key (.*)? $value /g) { $arrayris[$indice]=$1; $indice++; } } }

      If not, you would need to change the while loop, (testing the regular expression, not the file read),to an if statement.

      Just a few questions.
      Chris