Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

getting next word or number after another

by bigup401 (Pilgrim)
on Dec 18, 2020 at 00:11 UTC ( [id://11125377]=perlquestion: print w/replies, xml ) Need Help??

bigup401 has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: getting next word or number after another
by Paladin (Vicar) on Dec 18, 2020 at 00:20 UTC
    Always
    use strict; use warnings;
    It would have told you that $string_2 wasn't declared, and hence probably a mistake. It should be $string. Also, in your second regex, you probably want $next_word and not $string, and finally, $1, etc. get reset after each successful regex match. So you need to save them after each use. So:
    #!/usr/bin/perl use strict; use warnings; my $string = " info John 100 - 2000 Kent"; my $word = "info"; $string =~ /$word\s*?(\S+)/; my $next_word = $1; $string =~ /$next_word\s*?(\S+)/; my $next_word_2 = $1; print "The next word after $word is $next_word\n"; print "The next word or number after $next_word is $next_word_2\n";

      thanks so much

        You've been advised to do this many times (and argued against it), you don't seem to be learning from your constant mistakes.

Re: getting next word or number after another
by davido (Cardinal) on Dec 18, 2020 at 03:47 UTC

    I would have kept the DOM around so that I wouldn't have to start parsing words out of unstructured text.


    Dave

      This now has at least 3 threads about this same problem. Using a DOM aware parsers has been suggested more than once, and dismissed without reason.

        Yeah, I've been following along. I jump in because I think, "Hm. How would I solve that?" My responses are mostly because I'm curious about what the big deal is that would prevent normal tools from being the best choice. I second guess myself, thinking maybe Mojo::DOM isn't good for this, so I try it out, and prove to myself that it is a reasonable choice for what's being done. After that, I post my finding. And then mostly I just get sad that we have a bunch of good people trying to help someone who seems hell bent on ignoring advice, ignoring the fact that HTML parsers were invented simulaneously to the advent of HTML (browsers have to understand HTML semantics, after all), and then getting cross with people when the answers that come back are so baffled by the unwillingness to use the right tool for the job.

        It's one thing to initially think that one could remove lug nuts from a wheel with a pair of pliers. Seems reasonable, especially if someone hasn't done it before and doesn't understand how hard it is to use that tool in this application. It's another thing to refuse the lug-wrench after again and again the pliers slip, foul up the nut, and bloody the knuckles of the person holding them. Particularly baffling when the lug wrench is free, available, and pretty easy to use. But even worse, the person with bloody knuckles holding the pliers is then asking us to show him how to change the tire with pliers, and gets upset when we pick up the lug wrench and say, "I wouldn't use pliers for this, I would use a lug wrench." Then the OP goes back and bloodies his knuckles a little more on another part of the tire change he hadn't thought through very well, and once again is upset that we can't help him make the pliers work better in the job for which they're not intended.

        I know, I shouldn't engage. But I keep thinking maybe a rational person, when presented with sufficient information, will make good decisions. I have a hard time accepting that not everyone is either rational, or capable of making good decisions given enough information. And yet irrationality and faulty decision making is part of being human; we're all guilty of it.


        Dave

Re: getting next word or number after another (updated x2)
by AnomalousMonk (Archbishop) on Dec 18, 2020 at 02:14 UTC

    Win8 Strawberry 5.8.9.5 (32) Thu 12/17/2020 21:09:41 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings -MData::Dump=dd my $s = " info John 100 - 2000 Kent"; my $word = ''; while ($s =~ m{ $word [^[:alnum:]]+ ([[:alnum:]]+) }xms) { print "next word after '$word' is '$1' \n"; $word = $1; } print "another way \n"; my @words = $s =~ m{ [[:alnum:]]+ }xmsg; dd \@words; ^Z next word after '' is 'info' next word after 'info' is 'John' next word after 'John' is '100' next word after '100' is '2000' next word after '2000' is 'Kent' another way ["info", "John", 100, 2000, "Kent"]

    Update 1: The first method above will fail to capture 'info' if there are no "non-word" (whitespace in this case) characters before the first word in the string. To capture the first word in this case, use
        $s =~ m{ $word [^[:alnum:]]* ([[:alnum:]]+) }xms
    (note * quantifier on [^[:alnum:]]* vice +).
    However, using $word as an anchor then fails if there is a repeated "word" in the string: try replacing "100" with another instance of "info" and see what happens. IMHO, "another way" is the better way to strip out "words" from a string.

    Update 2: Here's a way to loop through the string word-by-word regardless of leading/trailing whitespace or repeated words (but I still prefer stripping/extracting all words to an explicit or implicit array - the second method above):

    Win8 Strawberry 5.8.9.5 (32) Thu 12/17/2020 21:59:18 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings my $s = " info John info - 2000 Kent "; my $word; while ($s =~ m{ [^[:alnum:]]* ([[:alnum:]]+) }xmsg) { if (defined $word) { print "next word after '$word' is '$1' \n"; } else { print "first word is '$1' \n"; } $word = $1; } ^Z first word is 'info' next word after 'info' is 'John' next word after 'John' is 'info' next word after 'info' is '2000' next word after '2000' is 'Kent'


    Give a man a fish:  <%-{-{-{-<

Re: getting next word or number after another
by BillKSmith (Monsignor) on Dec 18, 2020 at 03:25 UTC
    Refer to \G anchor in perlre
    $string =~ /$word\s*?(\S+)/g; my $next_word = $1; $string =~ /\G\s*?(\S+)/g; my $next_word_2 = $1;
    Bill

      The \G anchor does no harm, but is not necessary in the OPed example case (/g modifier causes match position to be preserved in void/boolean/scalar context):

      Win8 Strawberry 5.8.9.5 (32) Thu 12/17/2020 23:11:18 C:\@Work\Perl\monks >perl -Mstrict -Mwarnings my $string = " info info 100 - 2000 Kent"; my $word = "info"; $string =~ /$word\s*?(\S+)/g; my $next_word = $1; my $nwo = $-[1]; # offset of start of capture $string =~ /\s*?(\S+)/g; my $next_word_2 = $1; my $nwo_2 = $-[1]; print "The next word after $word is $next_word @ $nwo \n"; print "The next word or number after $next_word is $next_word_2 @ $nwo +_2 \n"; ^Z The next word after info is info @ 27 The next word or number after info is 100 @ 50
      (Repeated words are also handled properly with/without \G.)


      Give a man a fish:  <%-{-{-{-<

        Thanks for your comment. My interpretation of the document required the \G. I would still recommend using it because it calls attention to the /g on the first match which is there only for its effect on the second match.
        Bill
Re: getting next word or number after another
by Marshall (Canon) on Dec 18, 2020 at 19:59 UTC
    Below is an obvious solution to your current formulation of the problem. You want: first word, first number, second word, second number. So make a stack of the words and a stack of the numbers and then interleave them for printout. I suspect that you were closer to a flexible solution when you had an array of lines. BTW, I do not consider parsing $string twice to be a problem - this makes the code easier and it will run so fast that it won't matter.

    When you keep reformulating the problem without showing edits and starting new threads, this just confuses the issue. It makes it very tough for a guy like me who just stumbles across this thing to make heads or tails of it.

    use strict; use warnings; my $string = " info John 100 - 2000 Kent"; (my @words) = $string =~ /([A-Za-z]+)/g; (my @nums) = $string =~ /(\d+)/g; print "data:",shift @words," "; #the "info" word print "uneven stack error!" if (@words != @nums); while (@words) { print "",shift @words,":",shift @nums," "; } print "\n"; # prints: "data:info John:100 Kent:2000 " <= what you said you wanted
    Update: When you say things like "now my problem i cant get the next word or number after john", we've lost all context about what the overall objective is because one would have to read another thread(s) to find out what the end result is that you desire. So, the responses become focused upon answering your current question which is not really what you need to know or even pertinent to what asked for in the first place!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11125377]
Approved by philipbailey
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-03-29 11:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found