Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Regex Parsing Chars in a Line

by kel (Sexton)
on Nov 27, 2019 at 16:49 UTC ( [id://11109315]=note: print w/replies, xml ) Need Help??


in reply to Re: Regex Parsing Chars in a Line
in thread Regex Parsing Chars in a Line

The problem is actually that in reality there are often enough extra hyphens *with* surrounding spaces. And it is often enough to see title fields with a space on only one side, which can vary is the space is present at all. The use of the pipe pulls at me as something to consider, though I would need to test it in both Win and Linux.

The fun part is hyphenated names. Since these are normally unspaced I can capture them and replace with spaces. The method I currently use is to split with a hyphen and to chack the size of the first var. I will try to do this with a regex with /\w{1,7}\-\w.+\-/x check and see if that works.

My main concern is the lack of infinite lookbehinds, though the newly discovered \K operator (thanks to this thread! It seems to work wonders on scripts I have adapted to it. ). Are there any modules that add extra capability to the Perl regex? It seems Python has one, but I am far too much a noob in that language to do any productive scripting there yet! (And i really do hate the idea of immutable strings...)

Replies are listed 'Best First'.
Re^3: Regex Parsing Chars in a Line
by AnomalousMonk (Archbishop) on Nov 27, 2019 at 19:15 UTC

    In the OP you wrote:

    I use hyphens as *field* seperators in parsing.
    I still don't understand if the files you are processing are produced by someone else in an insane format over which you have no control, or if you are generating these files yourself. If the former, you have my deepest sympathy; been there, done that. If the latter, I beg you either to use a reasonable separator character or to use Tux's excellent (and fast!) Text::CSV_XS module, which can both parse and generate CSV files (since this is what you seem to be trying to do). (And CSV really means Character Separated Values, so don't get hung up on commas.) There's also a Pure Perl Text::CSV_PP non-XS module; see Text::CSV for details.

    ... pipe ... I would need to test it in both Win and Linux.

    The only thing to remember about pipe is that it is a regex metacharacter, so it must be suitably escaped in any split or  qr// m// s/// pattern. I am aware of no differences between Windoze and *nix Perls as regards regex behavior or CSV file access, and such concerns are ameliorated if you use a module like Text::CSV_XS.

    My main concern is the lack of infinite lookbehinds ...

    I believe support for generalized variable-width (not infinite; nothing's infinite!) lookbehinds was added with Perl version 5.30 or thereabouts. You'll need to check this...

    Are there any modules that add extra capability to the Perl regex?

    I don't believe that regex operators can be overloaded as can general Perl operators. I have a vague recollection of having read somewhere on PM that it's possible to replace the entire Perl RE with another; this was described in terms of "It's possible, but..." and it was a big but!


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11109315]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-20 08:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found