Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Doubt in code - Beginner in perl

by Perl_Programmer1992 (Sexton)
on Dec 27, 2018 at 05:50 UTC ( [id://1227730]=perlquestion: print w/replies, xml ) Need Help??

Perl_Programmer1992 has asked for the wisdom of the Perl Monks concerning the following question:

I have just started learning perl a few weeks ago , I am facing a doubt in the below code . The program is supposed to give the count of number of times each word has appeared , but the output is not as expected. When I give this input : Perl
Programming
Perl
java
C++
Perl
Expected output :
C++ has appeared 1 times
java has appeared 1 times
Perl has appeared 3 times
Programming has appeared 1 times
Actual output :
C++
has appeared 1 times
Perl has appeared 1 times
Perl
has appeared 2 times
Programming
has appeared 1 times
java
has appeared 1 times
I am a total beginner in perl and programming too , kindly pardon me if I have asked any silly question , but do correct me so I can learn from my mistake , thanks !

#! /usr/bin/perl my (@words , %count , $words); chomp(@words = <STDIN>); foreach $words (@words){ $count{$words} += 1; } foreach $word (sort keys %count){ print "$word has appeared $count{$word} times\n"; }

Replies are listed 'Best First'.
Re: Doubt in code - Beginner in perl
by haukex (Archbishop) on Dec 27, 2018 at 07:55 UTC

    Welcome to Perl and the Monastery, Perl_Programmer1992!

    The way you've shown it here, some of the output is split onto two lines, and because of that I would guess that some of your strings have extra newlines at the end. You've used chomp, which is good, but it's possible that, for example, your input has CRLF line endings, and chomp is only removing the LF, not the CR. To check this, it's usually best to use a module like Data::Dumper, which normally comes installed with Perl. At the top of your program, add the lines use Data::Dumper; and then $Data::Dumper::Useqq=1; (the latter to make the output more helpful for this case), and then at the end of the program add print Dumper(\%count);. Then, if you see output where the strings look like, for example, "Perl\r", that means my theory was correct, as the \r represents the CR (\n is LF). If it's something else and you're not sure, feel free to post the output here, inside <code> tags. Another way to verify this, if you're on a *NIX system, is to use a program like hexdump or od to show the files, e.g. hexdump -C input.txt or od -tx1c input.txt - if you see 0d 0a, that's a CRLF and again, if you have doubts, show the output here, inside <code> tags.

    Now, as for how to fix it, I'd suggest maybe converting the input to use LF line endings. I doubt that your terminal will use CRLF when you type the input into it directly*, so I suspect that probably you have an input file that you are piping to Perl (such as cat input.txt | perl script.pl or perl script.pl <input.txt, so it's best to edit the file. One way to do this is to open the file in a text editor that supports different line endings, often they will have an option when saving the file to change them. You'd have to tell us which editor you're using for more help on that. Another option is to use a program like Tofrodos, which often comes installed on *NIX systems or can be installed (for example on Debian or Ubuntu, sudo apt-get install tofrodos), then you can use the command fromdos input.txt to change CRLF line endings to LF.

    One other option is to change the program itself, although I would really only recommend this if you know you'll be dealing with mixed line endings. You can remove extra whitespace from the end of the strings in @words by saying s/\s+\z// for @words; just after the chomp - this will loop over the strings, and the regular expression will remove any whitespace at the end of the string (for details on this, see perlsyn and perlretut). There is another alternative involving the special variable $/, but since that only applies when your know for sure that your input will always have CRLF line endings, we don't need to get into that here.

    Here are some more tips:

    • When posting here, it's best to put <code> tags around both your example input and your code.
    • Always Use strict and warnings, since they will help you catch (potential) mistakes. If you're not sure what the messages that you get mean, you can also add use diagnostics; at the top of the program.
    • I would suggest that you don't declare all your variables at the top of the file, the best thing is to declare them only when they are needed, for example foreach my $words (@words).
    • It's best to get into the habit of indenting your code, in this case the lines within the loops. Text editors can often help with that, or, you can use a tool like perltidy to clean up your code.
    • Instead of <STDIN>, Perl has a "magic" operator <>. If there are files specified on the command line, Perl will open and read those, but if there are no files specified there, it will read from STDIN. This means that instead of cat input.txt | perl script.pl, you can just say perl script.pl input.txt.
    • Some more hopefully helpful reading: Basic debugging checklist, and SSCCE

    * Update: To be more specific, on Windows, Perl will normally translate CRLF to LF on its own, so that your Perl strings should still only contain LF line endings, which is why I've been guessing that your input is coming from a file (on *NIX), but please correct me if I'm wrong.

      Thank you so much ! , you are spot on , the theory you mentioned in the first part of your reply is exactly what's happening with my code , I am actually using an online perl compiler and feeding the input from that(not from any file) , and as you correctly mentioned that every line contains CRLF at the end and chomp is only taking care of LF , and that's why the hash %count is treating the 2 "Perl" input in array as separate instead of treating them as unique , you provided so many valuable things in your reply and it will really help me on my journey to become a good perl programmer , once again thank you :-)

Re: Doubt in code - Beginner in perl
by Arik123 (Beadle) on Dec 27, 2018 at 07:24 UTC

    I couldn't reproduce your error. I've run your code, and the output is exactly what you expected.

    Are you sure you posted the code that generated the error?

    The error seems to be that the newline is chomp'ed only from the first element of @words (which is "Perl\n"). Note that your output contains the string "Perl" twice - once with a newline (which appears two time in the input list) and once without a newline (which appears once. That is, the newline was chomp'ed only once). All the rest of the elements of @words appear in the output with a newline.

    However, your code chomp(@words=<STDIN>) really does the work. It really chomp's the newline from all the elements. At least for me...

      Thank you for the reply , You are getting the correct output because you are probably providing the input to the program in different way then I am , your observation is correct , actually what's happening is that I am using the values in the array @words as keys in hash %count , now keys in hash are unique , but as you mentioned correctly that there are 2 "Perl" values , one with a newline and one without a newline , so it is treating them as separate keys , seems like chomp is not working on all the values , it's only removing newline from 1st input value. I have got the reason why my output is not correct because every line contains CRLF at the end and chomp(in my case) is removing only LF and keeping CR as correctly mentioned by @haukex

Re: Doubt in code - Beginner in perl
by 1nickt (Canon) on Dec 27, 2018 at 12:53 UTC

    Hi, welcome to Perl, the One True Religion.

    One of the benefits of programming in Perl is "insignificant whitespace." this means you can indent your code as well as separate "paragraphs" with new lines, with no effect on the program.

    There are many style preferences, and if you can't pick one, you can delegate the responsibility to perltidy ... but the two most important things for any programming style are readbility and consistency. Always be seeking to enhance those two qualities of your programs, as it makes them easier to extend and to maintain, as well as even to understand them when you come back half a year later and would like to recall the flow of your thought with the least amount of deciphering what the code simply says.

    It won't make a lot if differences in this snippet, but once you have a foreach loop inside each branch of an if ... elsif ... else conditional, inside a function, your indentation-free style will become a real impediment.

    Also recommended is not reusing variable names among variables of differing types, and declaring your variables as close as possible to the scope in which they are used.

    I would rewrite your code above something like:

    #! /usr/bin/perl use strict; use warnings; chomp( my @words = <STDIN> ); my %count; foreach my $word (@words) { $count{$word}++; } foreach my $found (sort keys %count) { print "$found has appeared $count{$found} times\n"; } __END__
    (Note: not addressing your original line-ending problem, which has been explained above.)

    Hope this helps!


    The way forward always starts with a minimal test.

      Thank you for your response , I will definitely keep in mind and adhere to all the coding tips you have mentioned.

Re: Doubt in code - Beginner in perl
by jimpudar (Pilgrim) on Dec 28, 2018 at 08:45 UTC

    Hi Perl_Programmer1992,

    Glad you got your question answered. Just wanted to reiterate, welcome to the wonderful world of Perl.

    You have truly chosen one of the most interesting and useful languages available!

    I encourage you to check out perlrun to learn about some of Perl's many command line options.

    These enable you to write very cool "one-liners" such as the following reproduction of your program:

    perl -wlnE '$w{$_}++; END { say "$_ has appeared $w{$_} times" for sort keys %w }'

    Happy coding!

    πάντων χρημάτων μέτρον έστιν άνθρωπος.

      That was really cool ! , it's amazing to see how a simple program like that can be written in much more efficient ways , thanks for your response and really looking forward to more great learnings.

Re: Doubt in code - Beginner in perl
by BillKSmith (Monsignor) on Dec 27, 2018 at 14:53 UTC
    We seldom have line-ending problems except when we prepare our input on a different operating system than we use to run our perl. Unless we explicitly specify otherwise, perl assumes that our data uses the line endings of the system that it is running under. Perl calls the appropriate IO-Layer to translate those line endings into perl newlines. (The fact that they are the same as the newline character in UNIX files really does not make any difference). If we try to read windows (CRLF) data under UNIX, the CR will be 'translated' to a perl newline. The LF is not translated at all. It is stored as an ordinary character (the first character of the next line - chomp does not see it at all). On output, the perl newline is 'translated' into a UNIX newline (CR). The LF (when present) is output as the first character of the next line (when it is output). This problem can be solved by specifying IO-Layers. Unfortunately, this is not a beginners topic. For now, just be aware of what is happening.

    Sorry, My explanation (italic) is hopelessly confused.

    Bill

      Sorry, but:

      If we try to read windows (CRLF) data under UNIX, the CR will be 'translated' to a perl newline. The LF is not translated at all.

      This makes it sound like the CR gets translated into some other character that is then present in the Perl string alongside the LF, which is not the case. From PerlIO in regards to the :crlf layer:

      On read converts pairs of CR,LF to a single "\n" newline character. On write converts each "\n" to a CR,LF pair.
      It is stored as an ordinary character (the first character of the next line - chomp does not see it at all).

      This is incorrect, the "\n" is stored at the end of the current line and removed by chomp (because $/ defaults to "\n").

      On output, the perl newline is 'translated' into a UNIX newline (CR).

      The *NIX newline is LF, not CR.

      The LF (when present) is output as the first character of the next line (when it is output).

      This doesn't make sense to me. The newline is output whenever we tell Perl to output it, typically at the end of the current line, either explicitly or implicitly via $\.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1227730]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-28 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found