Re: To split with spaces
by Cristoforo (Curate) on Aug 04, 2013 at 20:13 UTC
|
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
with fixed width columns, you could use unpack or substr to parse it. If the columns are tab separated, you could split on tab.
Chris | [reply] [d/l] |
|
|
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
As it is seen, there is different number of spaces between columns. So /\s/ is not working as well as /\s+/ is not working, because some columns have whitespace characters. Also substr function does not work due to same reason. Substr does not see whitespace character and passes to next column. I hope told my problem clearly:) | [reply] [d/l] |
|
|
| [reply] |
|
|
|
|
Re: To split with spaces
by Laurent_R (Canon) on Aug 04, 2013 at 21:07 UTC
|
This is not really a Perl problem. Your problem is to define exactly what your input really looks like, in order to figure out whether the third column exists or is missing. In other words, the problem is to define the input format. Once we know that, writing the Perl program that can do what you need is probably very easy.
As Cristoforo said, perhaps you have fixed length fields, in which case pack or substr are problably likely candidates for the functions you want to use. If you have tab separated fields, split is more likely to solve your problem. Or, maybe, the solution is in a regular expression match. It could also be that splitting on a single space (rather than multiple spaces with /\s+/ , as suggested by 0day, is simply the solution. But we can't figure out exactly what your input file really looks like from your post, because it has probably been reformatted in your post. At the very least, please supply your input file within code tags, we will be more likely to understand your input file format.
It would be even better to have a link to a sample of your input file. That would be better, because if you copy and paste a section of the file, it is quite possible that tabs get copied as groups of spaces, so that it might be difficult to undertand the real format or the original file.
| [reply] [d/l] [select] |
|
|
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
As it is seen, there is different number of spaces between columns. So /\s/ is not working as well as /\s+/ is not working, because some columns have whitespace characters. Also substr function does not work due to same reason. Substr does not see whitespace character and passes to next column. I hope told my problem clearly:) | [reply] [d/l] |
|
|
printf "|%4s|%4s|%2s|%2s|%2s|%3s|\n",
map {s/\s+//g;$_} unpack "A11A5A3A3A3A*" for <DATA>;
__DATA__
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
Would print:
|1234|2321| 0| |45|1st|
|2122|sdsa| 0| 0|34| |
|2313|dsad| | |43|2nd|
|1232|ffff| 0| 0| |1st|
|3213|sadf| | 0|34| |
|2133|dada| 0| | |2nd|
| [reply] [d/l] [select] |
|
|
Now that we have a format making sense, i.e. a fixed-column format, this definitely looks like a work for the substr or unpack function, the problem is to find the right parameters (offset and lenbgth) to retrieve your fields. I can't make a test right now, but will come back to you when I can.
UPDATE: actually, I had not seen that when I posted the above 3 minutes ago, but Davido and others have already given a solution. Probably no point to come back and give the same.
| [reply] [d/l] [select] |
|
|
Re: To split with spaces
by ww (Archbishop) on Aug 04, 2013 at 21:13 UTC
|
As posted, your data fields are separated by one or more spaces and Line 3 has "43" as its third field (eg $field[2]... so the result from your code is as you should expect. The same applies to Line 5. And I'm not absolutely clear about what you're trying to tell us in the last line of your post.
Your failure to use code tags (viz, the formatting instructions at the text entry box where you created your node) makes it difficult to tell exactly how you intended the data to be structured -- you used multiple non-breaking space entities, but did you do so to match the actual spaces (0x20) in your data or to make the rendered appearance like that of a table with tabs?
In short, more information from you and closer attention to the local formatting directions will make it easier for us to help you.
If I've misconstrued your question or the logic needed to answer it, I offer my apologies to all those electrons which were inconvenienced by the creation of this post.
| [reply] [d/l] |
Re: To split with spaces
by 0day (Sexton) on Aug 04, 2013 at 19:13 UTC
|
Try:
@fields = split(/\s/,$line); | [reply] [d/l] |
|
|
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
As it is seen, there is different number of spaces between columns. So /\s/ is not working as well as /\s+/ is not working, because some columns have whitespace characters. Also substr function does not work due to same reason. Substr does not see whitespace character and passes to next column. I hope told my problem clearly:) | [reply] [d/l] |
|
|
Your data is in fixed-width fields. The third column always starts at the same character position one line after another. substr would work just fine for this. Given the example data you posted, you just need to start at the 15th position, and read two characters. In other words, my $third_col = substr $line, 15, 2;
In fact, it's possible that you could just start at the 16th position and read a single character, but I would need to know more about the input data before I could be sure.
Anyway, for the data you posted, this works fine:
use v5.14;
say unpack( 'x15A2' ) =~ s/^\s+|\s+$//gr while <DATA>;
__DATA__
1234 2321 0 45 1st
2122 sdsa 0 0 34
2313 dsad 43 2nd
1232 ffff 0 0 1st
3213 sadf 0 34
2133 dada 0 2nd
I used unpack instead of substr, but either one would work fine.
| [reply] [d/l] [select] |
Re: To split with spaces
by ricDeez (Scribe) on Aug 04, 2013 at 22:15 UTC
|
Another option is to use pipe delimited text as it allows you to visually inspect the data in any text editor. You could then do something like this:
use v5.12;
use warnings;
use Data::Dump qw(ddx);
my @fields = map { ( split /\|/ )[2] } map { chomp; $_ } <DATA>;
ddx @fields;
# test.pl:5: (0, 0, "", 0, "", 0)
__DATA__
1234|2321|0|45|1st
2122|sdsa|0|0|34
2313|dsad||43|2nd
1232|ffff|0|0|1st
3213|sadf||0|34
2133|dada|0||2nd
| [reply] [d/l] |
|
|
While that is one way to do it, the OP doesn't have pipe-delimited data. They have the format shown, and that seems to be what they must work with.
It is possible to convert it into pipe-delimited, but then we'd be back where we are now. ;-)
~Thomas~
"Excuse me for butting in, but I'm interrupt-driven..."
| [reply] |
Re: To split with spaces
by locked_user sundialsvc4 (Abbot) on Aug 05, 2013 at 12:31 UTC
|
| |
Re: To split with spaces
by Laurent_R (Canon) on Aug 05, 2013 at 22:01 UTC
|
Thank you, but I tried and tested substr and unpack functions. These are not working. Because our input data is not a fixed-column format. Some of columns have whitespace characters and substr and unpack functions ignore these whitespace characters and pick up next columns...
The pack and substr functions don't ignore white spaces. But given that they work with positions within the string, they may have trouble solving mixtures of whites spaces and tabulations (because a tab takes only one position in a string, but usually several on the printed line). This is at least my hypothesis # 1, by far the most likely in my eyes. But you could also have some other nasty invisible characters (backspace and what not), which we cannot guess with the copy and paste that you are providing so far.
We really need to know exactly and in detail what you raw file looks like (unformated). Either make the file available by some means so that we can download it and look at it, or possibly supply an hex dump of it (although this is less practical).
Meanwhile, you could also try to split your records on single tabs, rather than spaces, and see what you get. Changing your original code to something like this:
@fields = split /\t/, $line;
It might just be the solution.
| [reply] [d/l] [select] |
Re: To split with spaces
by Anonymous Monk on Aug 06, 2013 at 03:50 UTC
|
Hi,
Thre first thing to do is to go back to your boss and ask him for the file spec.
Any of those blanks might, on different lines, have a number or letter in it. Back in the distant past, when disk was expensive, to save space we would put 8 1 bit flags into a 1 byte column in a fixed-width file.
Once you have the file spec, you will know the format of the file and things will start to fall into place.
J.C.
| [reply] |