Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Parsing a clunky file

by SamCG (Hermit)
on Mar 14, 2014 at 21:25 UTC ( #1078396=perlquestion: print w/replies, xml ) Need Help??

SamCG has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks, So, I've got a file that was spit out of SQLPlus running in a command window in Windows. It seems to be breaking lines at 80 characters. The first set of lines looks like the header rows of the extract, and then the data comes. Nothing is lined up. It pretty much looks like the following (except I did a lot of replacement below to obscure pretty much everything -- but you can see where I have data and where it's header). I've been thinking going with pulling out by line numbers, though I'd have to get the pattern right -- I'd need lines 9-12, 23-26, 37-40, and so on... I guess add 11 and then get four lines? It just seems very clunky, and though the data looks pretty stable for at least 200 lines or so I just need one minor issue that would throw off everything. Adding to the ugliness is that none of these things are tab-delimited. Maybe I go re-run the query over the weekend and hard-code tabs in. And all this isn't even as ugly as what I actually need to do with this data (which would be okay, if my boss had any idea that if he asked anyone else to do this, they'd just tell him it was impossible >:|
IFDI_PXJPB_ID MJ_IFDI_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- XXXXXX99954499999-99999999_94334633_SPJS_334 XXXX + KHYSSPFKBUPP MFKPBIFJS FF IFC + KHYSSPFKBUPP MFKPBIFJS FF, IFC + SKS_3449436933_36559337_994 IF + US + FJIP_IFKBJ_ID MJ_FJIP_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- SKSJCK39954494444-36559366_94334633_FJPX_994 XXXX + KKP MFIFCY F/P + KKP-MFIFCY F/P + SKS_3447994645_36559366_994 IF + US + FJIP_IFKBJ_ID MJ_FJIP_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- SKSF43399543F9P49-36563753_94334633_FJPX_994 YBS + KDK PPC CYBP. YF CFFFDF + KDK CYBP YF FMPBICF PFYBYJJ + SKS_3443656944_36563753_994 IF + US




-----------------
s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;

Replies are listed 'Best First'.
Re: Parsing a clunky file
by runrig (Abbot) on Mar 14, 2014 at 22:17 UTC
    Any reason to not use DBI and DBD::Oracle ?? Would be a lot easier than parsing sqlplus output.
      Yes, unfortunately. I have permission to install pretty much nothing, including drivers/modules/etc. From past experience I know that while DBD::Oracle is wonderful, it requires some setup.



      -----------------
      s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
        Perl apparently got installed somehow. If this sort of thing is likely to be needed again, I'd make a go at installing DBD::Oracle (and Oracle client tools if needed), or finding a pre-compiled (ppm) package...
Re: Parsing a clunky file
by Jim (Curate) on Mar 14, 2014 at 23:42 UTC

    This really strikes me as an XY Problem. You don't want to have to grapple with the output of a simple SQL SELECT query in this "clunky" format unless you absolutely have to. And it doesn't seem like you have to because you admitted you could rerun the query this weekend. So rerun it and change the output format to CSV (comma-separated-value) or TSV (tab-separated-value). Problem solved.

    Save your mad skillz for an occasion when your boss has gone to everyone else and been told "It's impossible." Querying data from an Oracle database in an immediately useable form is hardly impossible.

    Jim

      Thank you, I think you're right. (and FWIW, it's not the querying part that people are going to say is difficult, it's the next steps. but I already have ideas on how perl will help me with that.)



      -----------------
      s''limp';@p=split '!','n!h!p!';s,m,s,;$s=y;$c=slice @p1;so brutally;d;$n=reverse;$c=$s**$#p;print(''.$c^chop($n))while($c/=$#p)>=1;
Re: Parsing a clunky file
by roboticus (Chancellor) on Mar 15, 2014 at 13:21 UTC

    SamCG:

    In sqlplus, you can set the line size and column formats. Doing so would let you put each record on a single line, making it easier to parse. sqlplus can even put its output in HTML format, if that would be useful.

    It may not help you in this case, but if you have to do something similar in the future, insist that the person providing the report format it reasonably.

    (Note: I agree with the earlier suggestions to use DBI/DBD, but I know you can't always do that.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Parsing a clunky file
by graff (Chancellor) on Mar 15, 2014 at 01:37 UTC
    Based on the example, it looks like the lines you want are at the end of a blank-line-delimited record, so it would probably be easiest (and maybe even most reliable), if you read the file in "paragraph mode", and pop lines off the end of each record until you hit one that consists of just dashes. Something like this (updated to rearrange the conditions for the while loop, which could make it more reliable):
    #!/usr/bin/perl use strict; use warnings; $/ = ""; # set input record separator to paragraph mode my $recid = 0; while (<DATA>) { my @lines = split /\n/; my @wanted = (); while ( @lines and $lines[-1] !~ /^-{10}/ ) { unshift @wanted, pop @lines; } if ( @wanted ) { unshift @wanted, sprintf( "Record # %d", ++$recid ); print join( "\n", @wanted ), "\n"; } } __DATA__ IFDI_PXJPB_ID MJ_IFDI_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- XXXXXX99954499999-99999999_94334633_SPJS_334 XXXX + KHYSSPFKBUPP MFKPBIFJS FF IFC + KHYSSPFKBUPP MFKPBIFJS FF, IFC + SKS_3449436933_36559337_994 IF + US + FJIP_IFKBJ_ID MJ_FJIP_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- SKSJCK39954494444-36559366_94334633_FJPX_994 XXXX + KKP MFIFCY F/P + KKP-MFIFCY F/P + SKS_3447994645_36559366_994 IF + US + FJIP_IFKBJ_ID MJ_FJIP_CJYSP_HYJP_ +C -------------------------------------------------- ------------------- +- CJYSP_MP + ---------------------------------------------------------------------- +---------- CJYSP_SUI_MP + ---------------------------------------------------------------------- +---------- CJYSP_ID CJYSP_ID_FJIP_CD + PFB -------------------------------------------------- ------------------- +- --- SKSF43399543F9P49-36563753_94334633_FJPX_994 YBS + KDK PPC CYBP. YF CFFFDF + KDK CYBP YF FMPBICF PFYBYJJ + SKS_3443656944_36563753_994 IF + US
Re: Parsing a clunky file
by karlgoethebier (Abbot) on Mar 15, 2014 at 14:13 UTC
Re: Parsing a clunky file
by Lennotoecom (Pilgrim) on Mar 15, 2014 at 00:02 UTC
    please show the example of the desirable output
    for the first element of your input data

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1078396]
Approved by Laurent_R
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2022-01-28 23:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (74 votes). Check out past polls.

    Notices?