Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

How to Remove Commas ?

by allendevans (Initiate)
on Mar 27, 2012 at 12:15 UTC ( [id://961908]=perlquestion: print w/replies, xml ) Need Help??

allendevans has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

I'm attempting to remove the commas from this directory listing (provided to me as a textfile).

Directory of E:\DOM\D2\HDD\IN11-135 04/04/2011 11:29 PM <DIR> . 04/04/2011 11:29 PM <DIR> .. 11/21/2010 05:02 AM 1,572,727,014 IN11-135.E01 11/21/2010 05:02 AM 1,223 IN11-135.E01.txt 11/21/2010 05:02 AM 1,572,694,696 IN11-135.E02 11/21/2010 05:02 AM 1,572,740,428 IN11-135.E03 11/21/2010 05:02 AM 1,572,785,002 IN11-135.E04 11/21/2010 05:02 AM 1,572,696,748 IN11-135.E05 11/21/2010 05:02 AM 1,572,745,871 IN11-135.E06 11/21/2010 05:02 AM 1,572,737,726 IN11-135.E07 11/21/2010 05:02 AM 1,572,785,459 IN11-135.E08 11/21/2010 05:02 AM 1,572,777,135 IN11-135.E09 11/21/2010 05:02 AM 1,572,751,922 IN11-135.E10 11/21/2010 05:02 AM 1,572,684,462 IN11-135.E11 11/21/2010 05:02 AM 1,556,456,660 IN11-135.E12 13 File(s) 18,856,584,346 bytes
My task is to parse out the filesizes and filenames, determine the averagefilesize, and provide a formatted output as shown below.
100 File name 1 1000 File name 2 100 File name 3 Total files: 3 Average file size: 400 bytes

I tried the following perl code (most recent script) without success. I've tried using both scalars and arrays. Reading the information is simple enough, it's the embedded commas that cause problems. I've not found a "search and replace" command to replace the commas with "nothing", allowing direct numeric manipulation of the scalar variables. I dabbled with arrays, but that got hairy fast.

Any assistance will be greatly appreciated.

#!/usr/bin/perl # author: Allen Evans # title: Lab3 assignemnt # purpose: Calculate average file size (avfsz) # open Lab3.txt file # ================== open(INPUT, "./Lab3.txt") or die "Can't open Lab3.txt\n"; # declare variables # ================= # unnecessary in perl, but good coding habits are tough to make $date = ""; $time = ""; $ampm = ""; $filesize = 0; $filename = ""; $totalfilesize = 0; $averagefilesize = 0; $i = 0; # while not eof # ============= while ($line = <INPUT>) { ($date, $time, $ampm, $filesize, $filename) = split(" ", $line); # determine $filesize # =================== # print "filesize: $filesize\n"; # debugging code $totalfilesize += $filesize; # create $totalfilesize va +riable # print "totalfilesize: $totalfilesize\n"; # debugging code # remove commas from $filesize string # =================================== # NSTR # place holder # convert $filestize string to numeric # ==================================== # NSTR # place holder # calculate totalfilesize and average, increment i # ================================================ $i += 1; # increment $i (denom) # print "i: $i\n"; # debugging code $averagefilesize = $totalfilesize / $i; # calculate $av +eragefilesize # print "averagefilesize: $averagefilesize\n\n"; # debugging code } # print output # ============ print "$filesize $filename\n"; # print output to screenn print "Total Files: $i Avg Size: $averagefilesize\n\n";

Replies are listed 'Best First'.
Re: How to Remove Commas ?
by roboticus (Chancellor) on Mar 27, 2012 at 12:22 UTC

    allendevans:

    I've not found a "search and replace" command to replace the commas with "nothing", allowing direct numeric manipulation of the scalar variables.

    Really?

    $ cat t.pl #/usr/bin/perl use strict; use warnings; use feature ':5.10'; my $t = "1,234,567.89"; $t =~ s/,//g; say $t; $ perl t.pl 1234567.89

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: How to Remove Commas ?
by moritz (Cardinal) on Mar 27, 2012 at 12:20 UTC

      But it's way easier to get the list of files ...

      And fail the course for not doing the homework

Re: How to Remove Commas ?
by ramlight (Friar) on Mar 27, 2012 at 12:22 UTC
    The substitute command works really nicely here:
    # remove commas from $filesize string # =================================== $filesize =~ s/,//;
    Regular expressions like these are one of Perl's strengths.

      ramlight,

      Thanks for your post! I'm having the best success with your suggestion, some others caused perl to crash. :(

      I pasted the snippet into my script ...

      # while not eof # ============= while ($line = <INPUT>) { ($date, $time, $ampm, $filesize, $filename) = split(" ", $line); # remove commas from $filesize string # =================================== $filesize =~ s/,//;

      and it successfully removed the first comma. (below)

      Results from gnome-terminal:

      1572,727,014 IN11-135.E01 1223 IN11-135.E01.txt 1572,694,696 IN11-135.E02 1572,740,428 IN11-135.E03 1572,785,002 IN11-135.E04 1572,696,748 IN11-135.E05 1572,745,871 IN11-135.E06 1572,737,726 IN11-135.E07 1572,785,459 IN11-135.E08 1572,777,135 IN11-135.E09 1572,751,922 IN11-135.E10 1572,684,462 IN11-135.E11 1556,456,660 IN11-135.E12 Total Files: 13 Avg Size: 0

      The first comma was removed from each number. Now I'm off to perldoc.perl.org to see how I can remove the remaining commas.

      Thanks ... Allen.

        Have another read of the answer roboticus gave you, paying particular attention to the 'g' flag of the substitution.

        Cheers,

        JohnGG

        The first comma was removed from each number.

        The s/// operator by default removes only the first match. So:

        my $var = "foobarfoobaz"; $var =~ s/foo//; say $var; # says "barfoobaz"

        There are various flags you can include to alter its default behaviour though. One of the most useful is the "g" (global) flag...

        my $var = "foobarfoobaz"; $var =~ s/foo//g; say $var; # says "barbaz"

        Note that the slashes may be replaced with other characters, so you could equally write:

        my $var = "foobarfoobaz"; $var =~ s@foo@@g; say $var; # says "barbaz"

        Or even:

        my $var = "foobarfoobaz"; $var =~ s{foo}{}g; say $var; # says "barbaz"

        ... which some people might find more readable. Though note that there are a handful of characters (hash, question mark and single quote spring to mind) that trigger special behaviours here (perlop has more details).

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: How to Remove Commas ?
by ww (Archbishop) on Mar 27, 2012 at 12:23 UTC
    See perlretut with special attention to substitution, s/,//;
Re: How to Remove Commas ?
by flexvault (Monsignor) on Mar 27, 2012 at 13:16 UTC

    Dear allendevans,

      # unnecessary in perl, but good coding habits are tough to make

    Well that jumps out at me, since there are some very good coding habits for Perl also!

    roboticus showed you the way you should write your code by scoping your variables with 'my' or 'local' or 'our'. If you had 'use strict' as roboticus had, your script would not compile and you would have the opportunity to write better Perl code. It is not that you pre-defined your variables, but that somehow that makes for better coding habits.

    Do you think this code?

    $date = ""; $time = ""; $ampm = ""; $filesize = 0; $filename = "";
    is better or more clear than:
    my ($date, $time, $ampm, $filesize, $filename) = split(" ", $line);
    I like that you commented you code very well, but it was that comment above that needs to be changed/improved/removed. This may seem like a nit now, but if you continue to grow with Perl, then learning good Perl coding habits now, will help you in the future.

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin

      flexvault,

      Thanks for taking the time to read my post and script. The comment is a quip for the professor, who made an off-hand comment during the lecture about declaring variables.

      Which is the better way of declaring variables ... that depends on what the compiler (or professor, or employer) wants! I do feel more at ease declaring my variables beforehand, though. legacy feelings of inadequacy left over from previous programming languages ... :)

      Thanks ... Allen.

        Were I your professor, I'd mark that quip down, since it's attached to a method of declaring variables that's legal, but falls far short of 'best practice.'

        Perl is not one of your "previous programming languages" and your response to your own rhetorical question about "the better way of declaring variables" isn't quite 'on-target' here. Perl (well, thru 5.10, IIRC) will accept the way you've done it, without even suggesting that there's a better way... which involves understanding scope... and the best practice of limiting a variable's scope as narrowly as possible.

        Best practice for declaring variables varies from programming language to programming language. But the declaration you used is very far from best practice in Perl.

        When declaring a variable you in Perl, it is best practice to declare whether it's a lexical (my) or package (our) variable. e.g.:

        my $counter = 0;

        It's also best practice to declare the variable in the tightest possible scope. For example, if a variable is used inside a loop, and needs reinitialising each time round the loop, then declare it inside the loop, not before the loop.

        Find the bug here:

        use 5.010; use strict; my $gender = 'unknown'; while (<DATA>) { chomp; given ($_) { when ("Alice") { $gender = 'female' } when ("Annie") { $gender = 'female' } when ("Andy") { $gender = 'male' } when ("Arnold") { $gender = 'male' } } say "$_ is $gender."; } __DATA__ Alice Andy Arnold Jennifer Annie Henry

        Output is:

        Alice is female.
        Andy is male.
        Arnold is male.
        Jennifer is male.
        Annie is female.
        Henry is female.
        

        Why is Jennifer male? Why is Henry female? It's because $gender is declared outside the loop, so is allowed to stay alive between loop iterations. Merely moving the one line where it's declared solves our subtle bug:

        use 5.010; use strict; while (<DATA>) { chomp; my $gender = 'unknown'; given ($_) { when ("Alice") { $gender = 'female' } when ("Annie") { $gender = 'female' } when ("Andy") { $gender = 'male' } when ("Arnold") { $gender = 'male' } } say "$_ is $gender."; } __DATA__ Alice Andy Arnold Jennifer Annie Henry
        Alice is female.
        Andy is male.
        Arnold is male.
        Jennifer is unknown.
        Annie is female.
        Henry is unknown.
        

        Perl variables have some pretty cool features, but they can trip you up. The precaution of adding use strict near the top of each script is generally a wise one. This one line tells Perl to force you to declare all your variables. This doesn't stop you shooting yourself in the foot, but it makes it difficult to shoot yourself in the foot accidentally. (You're still able to shoot yourself in the foot if you put in a bit of effort.)

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        allendevans,

        Glad it was from the professor!

        Have a good one!

        "Well done is better than well said." - Benjamin Franklin

Re: How to Remove Commas ?
by GrandFather (Saint) on Mar 27, 2012 at 20:24 UTC

    Don't comment what the code tells you it does no matter how wonderfull the code seems to be. You can almost guarantee that if you choose good variable names and write clean code your tutor will have no trouble understanding it without comments. It's even possible the tutor may actually be able to see the code instead of being overwelmed by comments.

    Always use strictures (use strict; use warnings;). That on its own would pick up at least one set of bugs in your code. Declare your variables in the smallest scope that you can (other replies have already said that so it must be true) - that with strictures would pick up another bug for you.

    Use the three parameter version of open and lexical file handles (declared with my).

    Use indentation to reflect the structure of your code - indent one level for each nested block.

    The following code illustrates these points. Note that there are several bugs left in this code that will show up due to strictures when you run it. Do not put your name on this code and hand it in unless you want low marks!

    #!/usr/bin/perl use strict; use warnings; my $file = <<'FILE'; Directory of E:\DOM\D2\HDD\IN11-135 04/04/2011 11:29 PM <DIR> . 04/04/2011 11:29 PM <DIR> .. 11/21/2010 05:02 AM 1,572,727,014 IN11-135.E01 11/21/2010 05:02 AM 1,223 IN11-135.E01.txt 11/21/2010 05:02 AM 1,572,694,696 IN11-135.E02 11/21/2010 05:02 AM 1,572,740,428 IN11-135.E03 11/21/2010 05:02 AM 1,572,785,002 IN11-135.E04 11/21/2010 05:02 AM 1,572,696,748 IN11-135.E05 11/21/2010 05:02 AM 1,572,745,871 IN11-135.E06 11/21/2010 05:02 AM 1,572,737,726 IN11-135.E07 11/21/2010 05:02 AM 1,572,785,459 IN11-135.E08 11/21/2010 05:02 AM 1,572,777,135 IN11-135.E09 11/21/2010 05:02 AM 1,572,751,922 IN11-135.E10 11/21/2010 05:02 AM 1,572,684,462 IN11-135.E11 11/21/2010 05:02 AM 1,556,456,660 IN11-135.E12 13 File(s) 18,856,584,346 bytes FILE open my $in, '<', \$file or die "Can't open Lab3.txt\n"; my $numFiles; my $totalFileSize = 0; while (my $line = <$in>) { my ($date, $time, $ampm, $filesize, $filename) = split(" ", $line) +; $filesize =~ y/,//d; $totalFileSize += $filesize; ++$numFiles; } #printf "%6d %s\n", $filesize, $filename; if ($numFiles) { printf "Total Files: %d, avg Size: %d\n\n", $numFiles, $totalFileS +ize / $numFiles; }
    True laziness is hard work
Re: How to Remove Commas ?
by Anonymous Monk on Mar 27, 2012 at 12:32 UTC
    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump qw/ dd /; my $line = q{11/21/2010 05:02 AM 1,572,696,748 IN11-135.E05}; #11/21/2010 05:02 AM 1,572,696,748 IN11-135.E05 #1234567890 12345678 12345678901234567 # date A10 time +A8 size A18 filename the rest # x2 to ignore spaces between date and time my( $date, $time, $sizewithcommas, $name ) = unpack "A10 x2 A8 x1 A18 + A*", $line; dd( [ $date, $time, $sizewithcommas, $name ] ); my $raw = q{ 03/19/2011 04:01 PM 1,451 filename 09/07/2011 05:15 AM <DIR> dirname }; open my($fh), '<', \$raw ; while(<$fh>){ my( $datetime, $filesizeordirindicator, $filename ) = eval { unpack + 'A20 A18 x1 A*', $_ }; warn $@ if $@; dd([ $datetime, $filesizeordirindicator, $filename ]); } __END__ ["11/21/2010", "05:02 AM", " 1,572,696,748", "IN11-135.E05"] 'x' outside of string in unpack at crap line 22, <$fh> line 1. [undef, undef, undef] ["03/19/2011 04:01 PM", " 1,451", "filename"] ["09/07/2011 05:15 AM", " <DIR>", "dirname"]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://961908]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (3)
As of 2024-04-25 16:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found