Trudge has asked for the wisdom of the Perl Monks concerning the following question:

I'm having trouble counting the number of characters in a DATA block in my script. Each line contains 6 fields separated by the pipe symbol '|'. In trying to sanitize the data before entering it into a DB I want to ensure each line contains 5 pipe symbols. But my script seems to only read the first line of the DATA block showing me it contains 5 pipes. What I have tried
my $PipeCounter=0; # how many? my $Lookfor="|"; while(<DATA>) { $PipeCounter = () = $_ =~ /\Q$Lookfor/g; } print "Found $PipeCounter '$Lookfor'\n";
which shows me
Running Sanitize on DATA ... Found 5 '|'
Sample DATA:
2015|Art Of Computer Programming - Volume 4 Fascicle 6 Satisfiablility + (The)|Art Of Computer Programming - Volume 4 Fascicle 6 Satisfiablil +ity (The).pdf|Knuth,Donald E.|Programming;Reference|The never-ending +story .. this book has been decades in the making. Three volumes were + available for years but the master himself has added this addition t +o Chap. 7 'Combinatorial Searching'. 2010|Art Of Photography (The)|Art Of Photography (The).pdf|Barnbaum,Br +uce|Photography|A successful photograph does one of several things. I +t allows, or forces, the viewer to see something that he has looked a +t many times without really seeing; it shows him something he has nev +er previously encountered; or, it raises questions - perhaps ambiguou +s or unanswerable - that create mysteries, doubts, or uncertainties. +In other words, it expands our vision and our thoughts. It extends ou +r horizons. It evokes awe, wonder, amusement, compassion, horror, or +any of a thousand responses. It sheds new light on our world, raises +questions about our world, or creates its own world. 1994|Art Of Woodworking Sharpening And Tool Care (The)|Art Of Woodwork +ing Sharpening And Tool Care (The).pdf|Time-Life Books|Woodworking To +ols|Whether you're using a chisel or a router or a lathe, you know th +at a sharp tool is critical to doing a good job. It's also safer - a +dull tool requires more force, and more force = less control. This bo +ok covers the proper techniques for sharpening all manner of your woo +dworking tools - hand tools, power tool blades and bits, portable pow +er tools, and stationary power tools. Detailed photographs and illust +rations with excellent descriptions and instructions. 2017|Art and Craft of Problem Solving 3E (The)|Art and Craft of Proble +m Solving 3E (The).pdf|Zeitz,Paul|Mathematics;Logic;Calculus|This is +a book about mathematical problem solving for college-level novices. +By this I mean bright people who know some mathematics (ideally, at l +east some calculus), who enjoy mathematics, who have at least a vague + notion of proof, but who have spent most of their time doing exercis +es rather than problems.
Anyone shed some light on this please?

Replies are listed 'Best First'.
Re: Count number of characters in a DATA block
by choroba (Cardinal) on Sep 04, 2022 at 17:56 UTC
    No, it reads all the lines, but only reports the result for the last one.

    To report the number for each line, move the final print inside the loop.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Ah yes. And to get the total:
      $PipeCounterTotal+=$PipeCounter;
      Thank you for the quick response!
Re: Count number of characters in a DATA block
by kcott (Archbishop) on Sep 05, 2022 at 04:16 UTC

    G'day Trudge,

    You've used unnecessarily complicated code. Many will not understand the perlsecret, Goatse; those that do may wish they didn't. All you really need is:

    $PipeCounter = y/|/|/;

    If it matters to you, that's also far more efficient.

    Furthermore, that huge block of sample data is fairly useless to us. It might be your real data, but it's far too long for most to be bothered reading. It also doesn't check for too few or too many fields nor any edge cases (e.g. blank lines, zero-width fields, and so on). This would have been better:

    #!/usr/bin/env perl use strict; use warnings; while (<DATA>) { print "Line $. has ", y/|/|/, "\n"; } __DATA__ ||||| !|@|#|$|%| |1|2|3|4| |q|w|e|r|t y|u|i|o|p A|S|D|F|G|H Z|X|C|V|B|N|

    Output:

    Line 1 has 0 Line 2 has 5 Line 3 has 5 Line 4 has 5 Line 5 has 5 Line 6 has 4 Line 7 has 5 Line 8 has 6

    From there, you can enhance your code; for instance, skip blank and comment lines, highlight lines with the incorrect number of fields, and whatever else is appropriate for you.

    — Ken

        G'day eyepopslikeamosquito,

        I hadn't realised that was in a FAQ; perhaps it's not quite as secret as I'd thought.

        Regardless, it's not something you can unsee, and =()= always makes me inwardly cringe.

        — Ken

Re: Count number of characters in a DATA block
by GrandFather (Saint) on Sep 05, 2022 at 21:46 UTC

    This looks very much like a "Character Separated Values" file (CSV). This is a very common file format (Excel spread sheets for example) and Perl has great tools for dealing with CSV file. Take a look at Text::CSV and see if that makes your life easier.

    If you are thinking of moving the data into a database you may also find DBD::CSV a useful tool. DBD::CSV allows you to access a CSV file using DBI to give a SQL database interface to the file.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Count number of characters in a DATA block
by Anonymous Monk on Sep 05, 2022 at 12:43 UTC

    OT: =()= can also be called the Saturn operator.

      Good point. I'd forgotten about that ... makes me feel prouder of creating Saturn long before that alternative name was proposed. :)

      I now have a list of references on this important topic: Perl Secret Operator References