Re: Removing characters
by moritz (Cardinal) on Jan 22, 2008 at 14:30 UTC
|
tr modifies the string it works on, and returns the number of substitutions.
BTW when you want to join two strings, use a dot ., not a plus +. (This is called "concatenation"). | [reply] [d/l] [select] |
|
|
Doh! Thanks. Back to the drawing board :(
| [reply] |
|
|
What should I be using instead of TR?
| [reply] |
|
|
You can use tr, but you have to discard its return value.
my $str = "ab c,d.e";
$str =~ tr/ ,.//;
# here $str is "abcde"
So you need to use the modified string after the tr. And don't try to stuff it all in one line ;-)
(Update: fixed match operator) | [reply] [d/l] |
Re: Removing characters
by olus (Curate) on Jan 22, 2008 at 15:35 UTC
|
This may be close to what you want
use strict;
use warnings;
my $lines = 0;
my $mystring = '';
my $chars = 0;
my $word_cnt = 0;
while(<DATA>) {
my $str = $_;
$chars += length($str);
$lines++ ;
my @words = grep /\w/, ($str =~ /\b\w*\b/g);
$word_cnt += $#words + 1;
$str =~ s/( |,|\.|\n|\t)//g;
$mystring .= lc($str);
}
print "Lines: $lines\n";
print "Words: $word_cnt\n";
print "Characters: $chars\n";
print "Total String: $mystring\n";
__DATA__
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
Outputs
Lines: 4
Words: 36
Characters: 180
Total String: thequickbrownfoxjumpsoverthelazydogthequickbrownfoxjumps
+overthelazydogthequickbrownfoxjumpsoverthelazydogthequickbrownfoxjump
+soverthelazydog
| [reply] [d/l] [select] |
|
|
THANKYOU very much, I see where I was going wrong now.
| [reply] |
Re: Removing characters
by hipowls (Curate) on Jan 22, 2008 at 14:56 UTC
|
my @word = split " ", $_;
could be rewritten as
my @words =
map { lc } # convert to lower case
split /[\s.]+/, $_; # split on white space and "."
this will have the effect of removing all white space and periods and giving a list of lower case words. You need to read it backwards, the split happens before the map.
| [reply] [d/l] [select] |
|
|
No I still can't get it to work. I just need to combine the contents of the file removing all white spaces, tabs, commas, return characters and periods.
E.g
Hey, diddle, diddle,
The cat and the fiddle,
heydiddlediddlethecatandthefiddle
| [reply] |
|
|
#!/usr/bin/perl
use 5.010_000;
use warnings;
use strict;
my %number_of;
my @words;
while ( my $line = <DATA> ) {
$number_of{lines}++;
$number_of{chars} += length $line;
push @words, map { lc } split /[\s.,]+/, $line;
}
$number_of{words} = scalar @words;
foreach my $item (qw(chars words lines)) {
say "Number of $item: $number_of{$item}";
}
say @words;
__DATA__
Hey, diddle, diddle,
The cat and the fiddle,
The cow jumped over the moon.
and the output
Number of chars: 75
Number of words: 14
Number of lines: 3
heydiddlediddlethecatandthefiddlethecowjumpedoverthemoon
If other punctuation is to be removed then you add it to the split. The difficulty would be with an apostrophe, you probably do not want to count don't as two words so the split line becomes
push @words, map { lc } # change to lower case
map { tr/'//d; $_ } # remove apostrophes
split /[\s.,?!"]+/, # split on space and punctuation
$line;
Update: removed a useless use List::Util; from the example. | [reply] [d/l] [select] |
Re: Removing characters
by svenXY (Deacon) on Jan 22, 2008 at 15:10 UTC
|
# replace (s///) all (g) non word-characters \W with nothing
$mystring = $mystring += $_=~ s/\W//g;
print lc($mystring); # lowercase
Regards,
svenXY
update: as said a couple of times by now, please consider reading perlop on tr/// and s/// and m/// as well as perlre on regular expressions. Otherwise you won't have much fun with Perl. And I can swear you: it IS fun if you get past the first few lessons...
update 2: $mystring = $mystring += $_=~ s/\W//g; is wrong. Please see my later post for a working example and forgive the error. | [reply] [d/l] [select] |
|
|
Thanks for the help so far everybody!
I tried this, $mystring seems to return 31 instead of the formatted string.
open (READFILE, "<poem.txt") || die "Couldn't open file: $!";
$buffer = "";
$lines = 0;
while(<READFILE>) {
$chars += length($_);
$lines++ ;
my @word = split " ", $_;
$word_cnt += @word;
chomp($mystring);
$mystring = $mystring += $_=~ s/\W//g;
}
print ("Lines: $lines\n");
print ("Words: $word_cnt\n");
print ("Characters: $chars\n");
print lc("$mystring\n");
| [reply] [d/l] |
|
|
Because despite being told that s modifies the string it's working on in place you persist in attempting to use its return value (the number of substitutions made) as something meaningful ($_ contains the converted string; you would want to append the contents of $_ to your accumulated buffer instead). Not to mention you're still trying to use + as a concatenation operator rather than the correct . operator.
Update: Oop, you're using s/// instead of tr/// but the point's the same (that it modifies the string in place and doesn't have a meaningfull (in this instance) return value). Wording changed accordingly.
And it's just been pointed out to me that that line wasn't yours, which only partly excuses you in that you cargo cultly included it without understanding that it was doing the very thing you'd already been warned about.
The cake is a lie.
The cake is a lie.
The cake is a lie.
| [reply] [d/l] [select] |
|
|
|
|
#!/usr/bin/perl
use strict;
use warnings;
my ($chars, $lines, @word, $word_cnt, $mystring, $bla );
$lines = 0;
while(<DATA>) {
$chars += length($_);
$lines++ ;
my @word = split " ", $_;
$word_cnt += @word;
($bla = $_) =~ s/\W//g; # assign the converted string of $_ to $bl
+a
$mystring .= lc($bla); # concatenate lowercase $bla to $mystring
}
print ("Lines: $lines\n");
print ("Words: $word_cnt\n");
print ("Characters: $chars\n");
print("Total String: $mystring \n");
__DATA__
Hey, diddle, diddle,
The cat and the fiddle.
Regards,
svenXY | [reply] [d/l] |
Re: Removing characters
by cdarke (Prior) on Jan 22, 2008 at 16:28 UTC
|
If you are using Windows this will not give the correct number of characters. You don't say which operating system, so forgive me if this is irrelevant.
Windows text file lines are terminated by two characters: "\r\n" (carriage-return and new-line). Perl hides the "\r", so your count will be -1 for each line in the file. You could just add 1 for each line, but better yet tell perl that you want to read each and every character. Use binmode READFILE; after the open.
Simplest way to get the size of the file though is to use -s (look it up) but I guess your tutors want the character count. | [reply] [d/l] |
Re: Removing characters
by apl (Monsignor) on Jan 22, 2008 at 15:50 UTC
|
To expand on what moritz originally said, you should replace
$mystring = $mystring += $_=~ tr/ ,\.\t\n//; with
$mystring .= $_=~ tr/ ,\.\t\n//;
What the ".=" operator says is "concatenate whatever follows on the right to the end of whever is specified to the left".
Revised:This is not an attempt to solve the original problem. moritz pointed out that "+" was not the concatenation operator, "." was. I tried to expand on this by showing that ".=" was more efficient still (from a Golfers perspective). | [reply] [d/l] [select] |
|
|
Actually, that wouldn't solve the problem.
Using tr was a good suggestion when Slug wanted to count the characters, but in this case he just wants to eliminate them.
From the documentation
If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.
So nothing would be replaced.
You are right, of course, on the '.=' operator.
| [reply] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |