JeanLaspost has asked for the wisdom of the Perl Monks concerning the following question:

Hallo everybody,

I'm a newbie and I have a newbie question:

I have a system dump which I would like to convert to a CSV-file.

The dump looks like this:

1.1- 1.2 All rights and privileges GRANTEE GRANTED_ROLE ADM DEF -------------------- -------------------- --- --- U1 CONNECT NO YES U2 RESOURCE ORA NO YES DBA1 JAVA_ADMIN NO YES ...

and the CSV should become like this:

1.1- 1.2 All grantees and privileges; GRANTEE;GRANTED_ROLE;ADM;DEF U1;CONNECT;NO;YES U2;RESOURCE ORA;NO;YES DBA1;JAVA_ADMIN;NO;YES ...

In the original dump the koloms are separated by fixed positions. I do however not find a good way to incorpotate that element in the regex. Is that the way to do it or is there a better way?

Thanks beforehand for all the aid!

Jean

Replies are listed 'Best First'.
Re: Regex selection based upon position
by Aristotle (Chancellor) on Nov 15, 2005 at 10:54 UTC

    You can do this with a regex:

    my @field = m{ \A (.{20}) \s (.{20}) \s (.{3}) \s (.{3}) \z }msx

    But you shouldn’t. Use unpack instead:

    my @field = unpack "A20 x A20 x A3 x A3", $_;

    Not least because the A pack template will automatically trim the whitespace for you. To achieve this using a regex, you have to jump through hoops:

    my @field = m{ \A (.{1,20}?) \s+ (.{1,20}?) \s+ (.{1,3}?) \s+ (.{1,3}? +) \z }msx

    Makeshifts last the longest.

Re: Regex selection based upon position
by Perl Mouse (Chaplain) on Nov 15, 2005 at 10:55 UTC
    While it's not impossible to extract substrings by position using a regular expression, it's much better to use pack or substr.
    Perl --((8:>*
      Hallo all,

      "unpack" is not even mentioned in my "Learn Perl in 24 hours". I need to buy a second book. :-)

      THANKS !!

      Jean

        "unpack" is not even mentioned in my "Learn Perl in 24 hours"

        Wow, not mentioned at all? Sheesh, I knew the 24-hours series wasn't great, but wow. Granted, unpack is not one of the most commonly-used functions, and I wouldn't expect a whole chapter on it or anything, but there should be at least a short discription of it. It is, after all, a builtin.

        With that said, I'd have used substr rather than unpack for this, personally. *Surely* the book at least has a discussion of substr.

        Until you get around to buying a new book, you can get by with perldoc It's not quite the same as having a book in your hands, but it's useful for reference.

        If you're looking for a good Perl book, there are some quite excellent ones available. Generally you will not go far wrong with O'Reilly books, for instance. The one with the camel on the cover is great if you have a background in programming in other languages and just need to learn the things that are unique to Perl. If you have little prior programming experience, you might be better off with the llama book. I also quite liked <cite>Effective Perl Programming</cite>, but that one assumes you already know a bit of Perl, so it might not be a good starting point. There are, of course, other good choices as well.

Re: Regex selection based upon position
by Moron (Curate) on Nov 15, 2005 at 11:08 UTC
    If that's all your doing to the file, sed is tailor-made for this kind of operation (from a *nix shell prompt:)
    sed -e 's/ /;/g' <YourFile.tsv >OutputFile.ssv
    (where the whitespace in the regex is meant to be a tab)

    Update: This assumes consecutive tabs should be converted to the same number of consecutive spaces. Otherwise I think I would be more inclined with:

    perl -e 'while( <> ){ s/(\t)+/\;/g; print $_; }' <YourFile.tsv >Output +File.scsv

    -M

    Free your mind

Re: Regex selection based upon position
by Tanktalus (Canon) on Nov 15, 2005 at 17:00 UTC

    Just a guess based on the sample data, but perhaps what you want to do is export the data using the db tools instead ;-) Or, select using DBI and the appropriate DBD, and then "insert" into a new table using DBD::CSV.

Re: Regex selection based upon position
by Hena (Friar) on Nov 15, 2005 at 11:22 UTC
    Perhaps split?
    while (<FILE>) { chomp; print join(";",split (/\s+/,$_)),"\n"; }
      Using split to separate fix-width data sounds like a terrible idea to me. While short data will be padded with whitespace, there's no garantee all data is short - a 6 character entry in a 6 character wide field will not be padded. Furthermore, since the data is fixed-width, there's no need to escape whitespace - there might be whitespace as data. In fact, what you think is padding might actually be part of the data!
      Perl --((8:>*
      "RESOURCE ORA"
        Sod. Didn't notice that. So this is no go.