Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Learning Perl by Doing

by raywood (Novice)
on Mar 24, 2017 at 03:04 UTC ( [id://1185704]=perlquestion: print w/replies, xml ) Need Help??

raywood has asked for the wisdom of the Perl Monks concerning the following question:

I would like to learn Perl by working through specific cases where I need it. This is the first such case. I have a situation much like the one described in an earlier discussion (Extracting blocks of text). Specifically, I have a number of old WordStar files in plain text. Each such file contains multiple .pa-delimited documents (consisting of various numbers of lines and paragraphs of text) that should be broken out into separate files. For example, one of these WordStar files might contain something like this:

Text text text .pa Other text text text .pa
In that example, resulting file no. 1 would contain "Text text text," and resulting file no. 2 would contain "Other text text text."

I assume, but am not certain, that every .pa appears at the left margin, and is followed by no other characters on the same line.

The earlier discussion suggested this solution, where the delimiter was the word "term" rather than ".pa":

#! perl -slw use strict; my @array = split 'term', do{ local $/; <DATA> }; shift @array; ## Discard leading null print '---', "\n", $_, "\n" for @array; __DATA__ term { yada yada 12345 () ... } term only occurs here { could be 30 lines here but never that word again until another block starts yadada } term, etc.
My questions, from that example:

1. That old discussion mentioned RAM concerns when slurping. My system has 16GB RAM. The files I am working on are small. But I may adapt the solution to other, larger files. When does RAM become an issue?

2. How would I adapt this solution to refer to a separate input file? In the suggested solution, the Perl code seems to be added to the start of the text file. I would rather have a separate Perl script and specify the target file at runtime.

3. What would be the best reference source, for purposes of interpreting the few Perl codes suggested in that solution?

4. Which version of Perl should I install, to run this code?

Many thanks.

Replies are listed 'Best First'.
Re: Learning Perl by Doing.. Learn by reading too
by Discipulus (Canon) on Mar 24, 2017 at 09:24 UTC
    hello raywood and welcome to the monastery and to the wonderful world of Perl!

    well learning by doing is a great thing but before you must learn by reading.

    Read solid and idiomatic example similar to your need is the first step to end with a fluent Perl.

    I always suggest the read of Perl Cookbook; it is a book a little bit aged but contains perfectly valid perl solution to common problems with clear explainations. If you want to mitigate the age of the book take you free copy of Modern Perl book: as suggested is a great book that discert about Perl following it's principles and concepts.

    You took an example by a poweful monk: i'm sure it is from BrowserUk because of the almost-signature perl -slw and it is wise to choose good exmamples but it is even wiser to understand deeply what the code really do (for instance -s perl switch can be useful while testing a script but can introduce very strange behaviors.. are you aware of them? no? do not use it!).

    I say this because it is better have your own simple code and submit it here for revision, code that you understand fully, that code by someone else if you only understand it for an half.

    Now answers:

    1) RAM is always involved and generally it is no good to waste it: that said I guess ancient WordStar (Oh WordStar I remember it: was the first word processor I used it on the 286 of my father) file will never eat your 16Gb RAM. Anyway when your slurp entirely a file it goes entirely in the RAM used by your perl process (with the little overhead of the variable that contains it). So it genarally wiser to process a file line by line than slurping it.

    Try it yourself observing the following two programs in the process manager

    this slurp the file

    use strict; use warnings; my $file = $ARGV[0]; open my $fh, '<', $file or die "cannot open $file"; # pause some second to go to inspect the process manager sleep 5; # the diamond op <> slurp the whole file when in list context (the arr +ay to the left force list context) my @array_of whole_file = <$fh>; sleep 5;

    while this iterates over it

    use strict; use warnings; my $file = $ARGV[0]; open my $fh, '<', $file or die "cannot open $file"; # pause some second to go to inspect the process manager sleep 5; # the diamond op is now in scalar context while (my $line = <$fh>){ # do noting 1; } sleep 5;

    Try these programs passing to them a 1Mb, 100Mb, 1Gb text file files and observe what happens

    2) to adapt your program to process different files the simplest solution is to pass them as arguments in the command line: they will fill the @ARGV array. See perlvar for it.

    So you can have something like:

    foreach my $file (@ARGV){ # check it exists.. die "$file does not exists!" unless -e $file; print "start processing $file\n"; # send the file to your own sub for processing my_custom_process_sub($file); ...

    3) Every Perl release is backward compatible so pick the more modern with a even minor version ie: 5.24

    HtH

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Learning Perl by Doing
by Anonymous Monk on Mar 24, 2017 at 03:27 UTC

    When does RAM become an issue?

    Sooner or later

    How would I adapt this solution to refer to a separate input file?

    Read perlintro you can't skip that

    What would be the best reference source, for purposes of interpreting the few Perl codes suggested in that solution?

    Combine ppi_dumper with Perl documentation / Modern Perl and super search/Tutorials...

    So operator, __DATA__, local, $_, $_...

    Which version of Perl should I install, to run this code?

    It will run on any perl that starts with a "5." , anything released in the last 20+ years

Re: Learning Perl by Doing
by 1nickt (Canon) on Mar 24, 2017 at 12:40 UTC

    Hi raywood,

    Welcome to the Monastery and to Perl, the One True Religion.

    1. That old discussion mentioned RAM concerns when slurping. My system has 16GB RAM. The files I am working on are small. But I may adapt the solution to other, larger files. When does RAM become an issue?

    When the file is too big.

    The better way is to read the file line by line, then it doesn't matter how big it is or how much RAM you have. In that example, instead of slurping in all the data with:

    my $data = do { local $/; <DATA> }; @array = split 'on something', $data;
    You would rather do:
    while ( my $line = <DATA> ) { # handle the line here instead of as an element of an array }

    How would I adapt this solution to refer to a separate input file?

    See open, or use a module like Path::Tiny (which handles errors and encoding for you):

    use Path::Tiny 'path'; my $file_path = '/foo/bar/baz.txt'; # if no memory concern, slurp: my @lines = path( $file_path )->lines_utf8({chomp => 1}); # or read from a filehandle line-by-line: my $fh = path( $file_path )->openr_utf8; while ( my $line = <$fh> ) { chomp $line; ... }

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Learning Perl by Doing
by eyepopslikeamosquito (Archbishop) on Mar 25, 2017 at 01:22 UTC

    Given your approach to learning Perl, you might be interested in some of the nodes listed here.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1185704]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-28 15:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found