character offset to word offset

newbio has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: character offset to word offset by kyle (Abbot) on Sep 08, 2008 at 19:09 UTC
Given a character offset, you can use substr to pick out the part of the string that comes before the word in question. Given that string, you can count the words, perhaps by splitting it. Given the substr that comes after the character offset, you know the word is at the front. You can use a regular expression to pull out that word. Then you can get its length to see what of the string comes after it. That should give you enough info to know where to put the asterisks before you print the result.	[reply]
Re: character offset to word offset by dwm042 (Priest) on Sep 08, 2008 at 20:46 UTC
I'm not going to solve this whole problem as it feels a bit too much like homework. But I could not resist a totally over engineered approach to solving the word count issue, by using a closure: #!/usr/bin/perl use warnings; use strict; my $sentence = "Sam goes to school to play football"; my $word = substr($sentence,12,17-12+1); my $next_word = substr($sentence,27,34-27+1); my $find_word = word_counter($sentence); print "$word: ", $find_word->($word),"\n"; print "$next_word: ", $find_word->($next_word),"\n"; sub word_counter { my $sentence = shift; my @words = split " ",$sentence; return sub { my $word = shift; my $count = 0; for (@words) { $count++; return $count if ( $word eq $_ ); } return 0; } } [download] The output is: `C:\Code>perl stuff1.pl school: 4 football: 7` [download]	[reply] [d/l] [select]
Re: character offset to word offset by GrandFather (Saint) on Sep 08, 2008 at 20:51 UTC
There are a couple of things to think about here. You need to figure out how you are going to store the word spans (where do they come from btw?). Deciding just what is a word can be tricky (e.g. for example). When editing involves changing the length of whatever is being edited it is often better to work backwards so that you don't change the indexes for subsequent edits. The following somewhat golfed code (don't hand it in to your teacher) counts words by counting the spans of spaces between them: `use strict; use warnings; my $sentence = 'Sam goes to school to play football.'; my @subs = ([12, 17], [27, 34]); my @counts; @subs = sort {$b->[0] <=> $a->[0]} @subs; # Descending sort by first c +har pos for my $sub (@subs) { unshift @counts, 1 + (substr $sentence, 0, $sub->[0]) =~ s/(\s+)/$ +1/g; substr $sentence, $_, 0, '' for $sub->[1] + 1, $sub->[0]; } print "@counts $sentence\n";` [download] Prints: `4 7 Sam goes to school to play football**.` [download] Perl reduces RSI - it saves typing	[reply] [d/l] [select]
Re: character offset to word offset by mr_mischief (Monsignor) on Sep 08, 2008 at 20:50 UTC
Getting the words from the character offsets is easy enough with substr. Here's an easily adaptable way to count the words before a particular word in a string for a sufficiently simple definition of 'word': `sub words_before { my ( $word, $string ) = @_; my ( $before ) = $string =~ m/(.*?)\Q$word\E/g; my $words_before = () = $before =~ m/(\S+)/g; return $words_before; }` [download] This can be used such as this: `print 'There are ' . words_before( $word, $string ) . ' words before " +' . $word . '" in the string "' . $string . '"' . "\n";` [download] The word index counting from 0 is the same as the number of words before the given word. Counting from 1, you'll need to add one to that amount.	[reply] [d/l] [select]
Re: character offset to word offset by ww (Archbishop) on Sep 08, 2008 at 23:04 UTC
-- for failure to show even a hint that you've tried anything other than an appeal to others... and another appears likely to be deserved for homework not labeled as such. Please see How do I post a question effectively? -- specifically, RTFM (Show Some Effort), Do Your Own Work, and About Homework.	[reply]
Re^2: character offset to word offset by AnomalousMonk (Archbishop) on Sep 09, 2008 at 00:08 UTC
I share the suspicions of and disdain for homework not labeled as such expressed by others, but since several code contributions have already been made, here's mine. As noted above, the tricky part is defining the word regex. This should also probably be defined separately and passed to the `emphasize()` function rather than being hard-coded. perl -wMstrict -le "{ my %emphasis; my $word = qr{ (\b \w+ \b) }xms; sub emphasize { my $string = shift; %emphasis = map { $_ => 0 } @_; my $intro = qq{@{[ sort { $a <=> $b } keys %emphasis ]}}; my $words = 0; $string =~ s{ ($word) } { exists $emphasis{++$words} ? qq{$1} : $1 }xmsge; return qq{$intro $string}; } } my $string = 'Sam goes to school to play football.'; print emphasize($string, 4, 7); print emphasize($string, 7, 2, 4); print emphasize($string); print emphasize('the cow jumped over the', 3, 4, 1); " 4 7 Sam goes to school to play football. 2 4 7 Sam goes to school to play football. Sam goes to school to play football. 1 3 4 the cow jumped over the [download]	[reply] [d/l] [select]
Re: character offset to word offset by betterworld (Curate) on Sep 08, 2008 at 19:09 UTC
I'm not sure I understand correctly what you want to do, but I'll try: `my $sentence = 'Sam goes to school to play football.'; substr($sentence, $_, 0) = '' for 35, 27, 18, 12; print $sentence, "\n";` [download] This will print `Sam goes to school to play football.`, and the asterisks have been put there using the character offsets. However, a more straightforward way to highlight school this way would be something like: `$sentence =~ s/(school)/\1*/;` [download] Update:* Sorry, I forgot the part about the word count.. Well, here's a way to get to the "4" (not very nice, maybe others have a more beautiful solution): `my $sentence = 'Sam goes to school to play football.'; # Count the spaces and add 1 my $count1 = substr($sentence, 0, 12) =~ y/ // + 1;` [download]	[reply] [d/l] [select]
Re: character offset to word offset by newbio (Beadle) on Sep 09, 2008 at 03:05 UTC
`Dear Monks, Thanks a lot for your help. This is not a homework problem but part of + a bigger project. Here is how I did it: substr($sentence, $_, 0) = '*' for @temp1; #@temp1 contains the desce +nding character positions used for marking. @temp2=split(' ', $sentence); #indices of the elements in @temp2 that +have ^\\* will give me their word count from the start. New thing that I learned here is that substr() could also be used for +assignment...:). Thank you all beloved monks!` [download]	[reply] [d/l]