Re: ascii manipulation in perl
by sauoq (Abbot) on Aug 12, 2003 at 21:27 UTC
|
-sauoq
"My two cents aren't worth a dime.";
| [reply] |
|
|
As much as I want to learn perl, I am in need or making sure that i can do what is needed. This is for work and unfortunately they are placing a tight deadline on doing this. I have read some documentation on perl and got a idea that this language can work, I just need to be shown which way to look.
| [reply] |
|
|
I'm sorry, but that is like asking an auto dealer if a car
can drive from one coast to another, before you take
driving lessons...
Of course Perl can do the job, so can C, and a host
of other languages. If you want help solving the problem
that give us better definition of what the problem is.
A sample of the different type of records would be
nice.
Peter @ Berghold . Net
Sieze the cow! Bite the day!
Nobody expects the Perl inquisition!
Test the code? We don't need to test no stinkin' code! All code posted here is as is where is unless otherwise stated.
Brewer of Belgian style Ales
| [reply] |
|
|
|
|
|
|
|
Re: ascii manipulation in perl
by allolex (Curate) on Aug 13, 2003 at 08:02 UTC
|
I'm sure you'll be able to do what you need with Perl. Go out and get davorg's book Data Munging with Perl; it describes what you need to know to do this sort of thing. You can pick up the book on-line as a PDF for USD 18.50 from the publisher, but the printed copy is nice to have.
You will still need to think about how the records in your text file can be isolated from one another. For us to help you in this forum, we'd need to know whether the information after the three-digit record type is also significant, i.e. whether it also needs to be divided up into fields. From what you have told us already, you could use a function called unpack to do this:
#!/usr/bin/perl
use strict;
use warnings;
my $template = 'A3A*'; # for unpack. it says you have some
# ASCII data of a fixed length of 3,
# and some more ASCII data until the
# end of the record (record means 'li
+ne' here)
while (<DATA>) {
my ($recordtype,$datastring) = unpack($template,$_);
print "Record type: $recordtype\n$datastring\n\n" unless $recordtype
+ eq '';
}
__DATA__
090071905090405611071905001029842000281253P0223P0504011 80017
090071905090405611071905001029842000291253P0223P0504011 80007
03007190519912660000739900000026500
03007190519912660000839900000011500
040071905453901800xxxxxx x xxxxxxxx M4000
040071905453901806xxxxxx xxxxxxxx F4131
05007190545393434503187100
05007190545393963405187100
06007190545337000380199516129001399002650
06007190545337182980240356129999399013784
OUTPUT:
Record type: 090
071905090405611071905001029842000281253P0223P0504011 80017
Record type: 090
071905090405611071905001029842000291253P0223P0504011 80007
Record type: 030
07190519912660000739900000026500
Record type: 030
07190519912660000839900000011500
Record type: 040
071905453901800xxxxxx x xxxxxxxx M4000
Record type: 040
071905453901806xxxxxx xxxxxxxx F4131
Record type: 050
07190545393434503187100
Record type: 050
07190545393963405187100
Record type: 060
07190545337000380199516129001399002650
Record type: 060
07190545337182980240356129999399013784
--
Allolex
| [reply] [d/l] |
Re: ascii manipulation in perl
by Zaxo (Archbishop) on Aug 12, 2003 at 21:52 UTC
|
Ok, from your clariification it sounds like you have a text database whose format was never really decided upon. I guess that a number of different ad hoc formats got appended.
Your best approach will probably be to try to characterize the different formats with regular expressions. Fixed length formats can be decoded with unpack, and delimited fields with split. It will take a certain amount of eyeball inspection to know you have detected all the formats. The trouble with ad hoc formats is that they usually are inadequate.
Once you can detect and decode the whole file, use one of the CSV modules to write to a single, capable, standard.
After Compline, Zaxo
| [reply] |
Re: ascii manipulation in perl
by CountZero (Bishop) on Aug 12, 2003 at 21:39 UTC
|
On a practical level, perhaps you can show us a relevant part of the input file and the way this has to be repaired, so we can at least have some understanding of what you are asking. CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
|
Yes, do something like "this is the data to begin with and this is the what I want the data to look like when I'm done".
--
Allolex
| [reply] |
Re: ascii manipulation in perl
by brkstr (Novice) on Aug 12, 2003 at 22:33 UTC
|
I meant to post this on the main thread.
I grabbed a few lines of the file:
090071905090405611071905001029842000281253P0223P0504011 80017
090071905090405611071905001029842000291253P0223P0504011 80007
03007190519912660000739900000026500
03007190519912660000839900000011500
040071905453901800xxxxxx x xxxxxxxx M4000
040071905453901806xxxxxx xxxxxxxx F4131
05007190545393434503187100
05007190545393963405187100
06007190545337000380199516129001399002650
06007190545337182980240356129999399013784
There are many more record types. By record types, I mean by the first three characters of each line. This is why I had meant to place into seperate file for manipulation.
thanks | [reply] |
|
|
You really should spend a few minutes reviewing some of the Tutorials. I know your on a tight timeline but it will most definatley take longer if you don't know what your doing and no degree of advice will suffice. So unless someone is willing just to do the work for you, I'd suggest reading the following:
The Basics
Basic Input and Output
File Input and Output
String matching and Regular Expressions
I know this seems like alot to go through, but I'm sure you can get through it in less than an hour.
Then when you have enough understanding you can ask more specific questions that I'm certain the Monks here will be happy to help you with.
edited by ybiC: working links from textual URLs
| [reply] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |