Regex selection based upon position

JeanLaspost has asked for the wisdom of the Perl Monks concerning the following question:

Hallo everybody,

I'm a newbie and I have a newbie question:

I have a system dump which I would like to convert to a CSV-file.

The dump looks like this:

1.1- 1.2 All rights and privileges

GRANTEE              GRANTED_ROLE         ADM DEF 
-------------------- -------------------- --- --- 
U1                   CONNECT              NO  YES 
U2                   RESOURCE ORA         NO  YES 
DBA1                 JAVA_ADMIN           NO  YES 
...
[download]

and the CSV should become like this:

1.1- 1.2 All grantees and privileges;
GRANTEE;GRANTED_ROLE;ADM;DEF 
U1;CONNECT;NO;YES 
U2;RESOURCE ORA;NO;YES 
DBA1;JAVA_ADMIN;NO;YES
...
[download]

In the original dump the koloms are separated by fixed positions. I do however not find a good way to incorpotate that element in the regex. Is that the way to do it or is there a better way?

Thanks beforehand for all the aid!

Jean

Comment on Regex selection based upon position Select or Download Code

Replies are listed 'Best First'.
Re: Regex selection based upon position by Aristotle (Chancellor) on Nov 15, 2005 at 10:54 UTC
You can do this with a regex: `my @field = m{ \A (.{20}) \s (.{20}) \s (.{3}) \s (.{3}) \z }msx` [download] But you shouldn’t. Use unpack instead: `my @field = unpack "A20 x A20 x A3 x A3", $_;` [download] Not least because the `A` pack template will automatically trim the whitespace for you. To achieve this using a regex, you have to jump through hoops: `my @field = m{ \A (.{1,20}?) \s+ (.{1,20}?) \s+ (.{1,3}?) \s+ (.{1,3}? +) \z }msx` [download] Makeshifts last the longest.	[reply] [d/l] [select]
Re: Regex selection based upon position by Perl Mouse (Chaplain) on Nov 15, 2005 at 10:55 UTC
While it's not impossible to extract substrings by position using a regular expression, it's much better to use `pack` or `substr`. `Perl --((8:>*`	[reply]
Re^2: Regex selection based upon position by Anonymous Monk on Nov 15, 2005 at 11:03 UTC
Hallo all, "unpack" is not even mentioned in my "Learn Perl in 24 hours". I need to buy a second book. :-) THANKS !! Jean	[reply]
Re^3: Regex selection based upon position by Aristotle (Chancellor) on Nov 15, 2005 at 11:10 UTC
You probably want to administer The Perl Book Litmus Test. Not that it matters, because Learn $FOO in 24 hours books are always bad. Makeshifts last the longest.	[reply]
Re: Regex selection based upon position by jonadab (Parson) on Nov 15, 2005 at 11:33 UTC
"unpack" is not even mentioned in my "Learn Perl in 24 hours" Wow, not mentioned at all? Sheesh, I knew the 24-hours series wasn't great, but wow. Granted, unpack is not one of the most commonly-used functions, and I wouldn't expect a whole chapter on it or anything, but there should be at least a short discription of it. It is, after all, a builtin. With that said, I'd have used substr rather than unpack for this, personally. Surely the book at least has a discussion of substr. Until you get around to buying a new book, you can get by with perldoc It's not quite the same as having a book in your hands, but it's useful for reference. If you're looking for a good Perl book, there are some quite excellent ones available. Generally you will not go far wrong with O'Reilly books, for instance. The one with the camel on the cover is great if you have a background in programming in other languages and just need to learn the things that are unique to Perl. If you have little prior programming experience, you might be better off with the llama book. I also quite liked <cite>Effective Perl Programming</cite>, but that one assumes you already know a bit of Perl, so it might not be a good starting point. There are, of course, other good choices as well.	[reply]
Re^2: Regex selection based upon position by Perl Mouse (Chaplain) on Nov 15, 2005 at 12:38 UTC
Re: Regex selection based upon position by Moron (Curate) on Nov 15, 2005 at 11:08 UTC
If that's all your doing to the file, sed is tailor-made for this kind of operation (from a nix shell prompt:) `sed -e 's/ /;/g' <YourFile.tsv >OutputFile.ssv` [download] (where the whitespace in the regex is meant to be a tab) Update:* This assumes consecutive tabs should be converted to the same number of consecutive spaces. Otherwise I think I would be more inclined with: `perl -e 'while( <> ){ s/(\t)+/\;/g; print $_; }' <YourFile.tsv >Output +File.scsv` [download] -M Free your mind	[reply] [d/l] [select]
Re: Regex selection based upon position by Tanktalus (Canon) on Nov 15, 2005 at 17:00 UTC
Just a guess based on the sample data, but perhaps what you want to do is export the data using the db tools instead ;-) Or, select using DBI and the appropriate DBD, and then "insert" into a new table using DBD::CSV.	[reply]
Re: Regex selection based upon position by Hena (Friar) on Nov 15, 2005 at 11:22 UTC
Perhaps split? `while (<FILE>) { chomp; print join(";",split (/\s+/,$_)),"\n"; }` [download]	[reply] [d/l]
Re^2: Regex selection based upon position by Perl Mouse (Chaplain) on Nov 15, 2005 at 12:42 UTC
Using split to separate fix-width data sounds like a terrible idea to me. While short data will be padded with whitespace, there's no garantee all data is short - a 6 character entry in a 6 character wide field will not be padded. Furthermore, since the data is fixed-width, there's no need to escape whitespace - there might be whitespace as data. In fact, what you think is padding might actually be part of the data! `Perl --((8:>*`	[reply]
Re^2: Regex selection based upon position by BUU (Prior) on Nov 15, 2005 at 11:34 UTC
"RESOURCE ORA"	[reply]
Re^3: Regex selection based upon position by Hena (Friar) on Nov 15, 2005 at 13:47 UTC
Sod. Didn't notice that. So this is no go.	[reply]