Seeking an Enlightened Path (Parsing, Translating, Translocating)

nanotasher has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Seeking an Enlightened Path (Parsing, Translating, Translocating) by BrowserUk (Patriarch) on Mar 10, 2008 at 20:21 UTC
A quick test indicates that this would take around 20 seconds to process 10 million records: `perl -ple"$_ = join'',(unpack 'A10 A4 A6 A6', $_)[ 2,1,0,3 ]" infile > +outfile` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re^2: Seeking an Enlightened Path (Parsing, Translating, Translocating) by starbolin (Hermit) on Mar 10, 2008 at 20:42 UTC
5 minutes on my 500 MHz Pentium III ;-<	[reply]
Re^2: Seeking an Enlightened Path (Parsing, Translating, Translocating) by nanotasher (Novice) on Mar 10, 2008 at 20:45 UTC
Unpack and then join. Very nice solution. I am getting my field definition before I would actually look at the file, so this is a very viable solution. Is it possible to make this dynamically? In other words, would it be possible to define a string and feed that into unpack, then define another string and feed it into the second part of join? Also, if it is possible to make it dynamic in this way, would it hinder performance?	[reply]
Re^3: Seeking an Enlightened Path (Parsing, Translating, Translocating) by NetWallah (Canon) on Mar 10, 2008 at 21:52 UTC
Perl is a "dynamic language", and generally encourages late, lazy, and dynamic programming.(Sorry - could not find a better introductory platitude) Typically, built-in functions do not care (or are unaware) weather they are passed constants, or variables. The only exception I can think of is re-evaluating regular-expressions in a loop. So the advice is to go ahead and feed in constructed arguments. Benchmark it, if you have doubts. "As you get older three things happen. The first is your memory goes, and I can't remember the other two... " - Sir Norman Wisdom	[reply]
Re^3: Seeking an Enlightened Path (Parsing, Translating, Translocating) by ack (Deacon) on Mar 10, 2008 at 21:55 UTC
Here is my stab at making it dynamic. #!/user/bin/perl use strict; use warnings; #----------- define field mappings --------------------- # hash to map from field i in input to field j in output # field i is the key to hash, field j is the value of hash my %from_to = ( 1 => 4, 2 => 2, 3 => 1, 4 => 3, ); # field length hash...key is the field number, value of hash # is the length in characters. Presumes the length of the # input and output fields are the same. my %field_len = ( 1 => 10, 2 => 4, 3 => 6, 4 => 6, ); #---------- setup decode string -------- my $decode_string = ""; foreach my $num (sort keys %from_to) {$decode_string .= 'A' . $field_len{$num} . ' '}; #-------------- process files ---------- my @input; my @output; foreach my $in_record (<DATA>) { chomp($in_record); print $in_record . " ---> "; @input =(unpack $decode_string,$in_record); foreach my $index (sort keys %from_to) {$output[($from_to{$index}-1)] = $input[($index-1)]}; my $out_record = join "",@output; print $out_record . "\n"; } exit(0); __END__ AAAAAAAAAA1111BBBBBB222222 BBBBBBBBBB2222CCCCCC333333 CCCCCCCCCC3333DDDDDD444444 [download] I used the two hashes to contain the mapping of the input field to the output field (`%from_to`) and the field lengths (`%field_len`). You, of course, could use whatever strategy you want. I also have prints in to see how things work. I had tried to make it more compact by trying to use a dynamic strategy for specifying the array slice as part of the join..decode line. But I couln't figure out (or remember) how to do that. ack Albuquerque, NM	[reply] [d/l] [select]
Re^4: Seeking an Enlightened Path (Parsing, Translating, Translocating) by ack (Deacon) on Mar 11, 2008 at 04:50 UTC
Re^3: Seeking an Enlightened Path (Parsing, Translating, Translocating) by BrowserUk (Patriarch) on Mar 10, 2008 at 22:10 UTC
In other words, would it be possible to define a string and feed that into unpack, then define another string and feed it into the second part of join? You'd have to explain that a little better, but someting like this might be close depending where/how you want to obtain those strings, The following would take comma separated arguments to construct the unpack template and field ordering: `perl -e"BEGIN{$T=join'',map{qq[A$_ ]}split',',shift;@F=split',',shift}" -ple"$_ = join'',(unpack $T,$_)[@F]" "10,4,6,6" "2,1,0,3" infile >outfile` [download] But note I've had to split the "one-liner" over several lines for posting. Once they start getting this long, writing a proper script is more convenient if you are going to reuse it. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l]
Re: Seeking an Enlightened Path (Parsing, Translating, Translocating) by olus (Curate) on Mar 10, 2008 at 19:08 UTC
Something like this? `use strict; use warnings; while(<DATA>) { $_ =~ s/(\D)(\d)(\D)(\d)/$3$2$4$1/g; print $_; } __DATA__ AAAAAAAAAA1111BBBBBB222222 BBBBBBBBBB2222CCCCCC333333 CCCCCCCCCC3333DDDDDD444444` [download] outputs: `BBBBBB1111222222AAAAAAAAAA CCCCCC2222333333BBBBBBBBBB DDDDDD3333444444CCCCCCCCCC` [download]	[reply] [d/l] [select]
Re^2: Seeking an Enlightened Path (Parsing, Translating, Translocating) by nanotasher (Novice) on Mar 10, 2008 at 20:52 UTC
I am no good with regex, but the solution is a viable one if I can configure the number of characters (instead of *). For instance, if I could do something like: `$_ =~ s/(\D[1-10])(\D[11-16]) ... /$3$4$1$2/g;` [download] That would work for input. Could I configure the second part of the search dynamically?	[reply] [d/l]
Re^3: Seeking an Enlightened Path (Parsing, Translating, Translocating) by olus (Curate) on Mar 10, 2008 at 22:45 UTC
From what I understood from your example, you wanted to switch the places of sequences of letters and sequences of numbers. In the example solution I gave you, the regular expression will be looking for four of those sequences regardless of the number of characters in each sequence (provided those sequences are in alternate order).That regexp is not looking for the positions of the characters in the line. From your example I saw a sequence of letters, or non-digits, so used the `\D` wildcard that matches non-digits. Then there is a sequence of digits, and the wildcard that matches digits is `\d`. Since there will be the need to switch the positions of those sequences, there is the need to capture them with `()` for later use. The example as explained above does not know the number of characters on each sequence, but if you do want to do the rearrangement based on particular positions in the line, there are alternatives that take that into account (besides the excellent one BrowserUK showed). If you say you have 10 characters, then 4, then 6 and finally 6 more, we can write such a regexp. For that we will use the `.` (Match any character), `{n}` (Match exactly n times) and the grouping `()`. The regexp would be: `$_ =~ s/(.{10})(.{4})(.{6})(.{6})/$3$2$4$1/g;` [download]	[reply] [d/l] [select]
Re^4: Seeking an Enlightened Path (Parsing, Translating, Translocating) by nanotasher (Novice) on Mar 11, 2008 at 14:50 UTC
Re: Seeking an Enlightened Path (Parsing, Translating, Translocating) by igelkott (Priest) on Mar 10, 2008 at 20:14 UTC
...under half an hour per day With the assumption that the pattern suggested by olus is applicable to your real record format, I ran a quick test for the other part of your question -- how long this might take. On my modest hardware (Intel Core 2 6600), 10 million records took about 37 seconds to process. Assuming a reasonable overhead from other operations, there should be no problem keeping within the time constraints.	[reply]
Re: Seeking an Enlightened Path (Parsing, Translating, Translocating) by Roy Johnson (Monsignor) on Mar 10, 2008 at 19:05 UTC
You have given an example that does not fully describe the problem. Do you have four fixed-length columns of data that you want to rearrange? Changing the order from 1234 to 3214? Caution: Contents may have been coded under pressure.	[reply]
Re^2: Seeking an Enlightened Path (Parsing, Translating, Translocating) by nanotasher (Novice) on Mar 10, 2008 at 20:34 UTC
Fair enough. Let me describe a little more. I have a fixed-length file. The length of each record should not change. I have a field definition table within Oracle that I can configure if input or output requirements change. The data within each field is alphanumeric. My post showed data that was numeric for one field, then alpha the next. That was confusing. Sorry. The fields can consist of any combination of letters, numbers or spaces (since everything will be left-padded). Since I have the field definitions, I know that characters 1-10 will be field 1 in the input, and that it has to map to field 4 in the output (same length). Field 2 in the input (let's say 6 characters) has to map to field 1 in the output. Field 3 in the input will actually have to switch values. It may be 10 characters long in input, but I have to match this value against a list of values (the key of a hash table) and have it be 4 characters in the output (the value of the same hash table).	[reply]
Re: Seeking an Enlightened Path (Parsing, Translating, Translocating) by ack (Deacon) on Mar 10, 2008 at 19:21 UTC
I, too (as Roy noted), can't tell what you're trying to do. Looks like you're trying to ...well, upon second look, I can't tell what you're trying to do. Subsequent nodlet seems to have a Regex that does, at least, what you're example suggests. Don't know how to help without better understanding of what the pattern transformation is trying to accomplish. ack Albuquerque, NM	[reply]