Hello, I would like to ask a question about how to parse the file format given below. :
F001 1.2 F101 3.2 solvent1 0 solvent2 3 F001 2.2 F101 7.2 solvent1 5 solvent2 0
The file I would like to create a hash that looks like this:
solvent1_F001 => 1.2 solvent1_F101 => 3.2 solvent2_F001 => 2.2 solvent2_F101 => 7.2
The hash key becomes the solvent with value zero and underscore the string starting with F. The hash value is the number on the right of the string starting with F. I am new to Perl and programming and I am struggling to figure it out. I have tried to write the code below but can not figure out what the hash %solvent_face hash indices should be. I think I may not be using the right approach. Thank you in advance for the help.
my @temp; my @face_ac; my $size; my %solvent_face; open(my $fh,'<', $file) || die "Can not open file: $!"; while (my $row = <$fh>) { chomp $row; if ($row=~/\d+ F/) { @temp=split(' ',$row); push @face_ac, @temp; } if ($row=~/\d+ [a-z]/) { @temp=split(' ',$row); if ($temp[2]==0) { $size=@face_ac for (my $i = 1; $i < $size+1; $i++) { $solvent_face{$temp[1]."_".$face_ac[???]}=$face_ac[???]; } print "@temp\n"; } } } close $fh;
Update Thank you all for providing help with this. Reading some of the replies made me realised I didn't do a good job at properly describing the file structure with the data hence this update. Sorry about that. The example data provided above is just a representative example rather that the actual full data. In the real file the number of strings starting with F can vary and is usually in the range of 3 to 10. The number of solvents can also vary in number up to ca. 60. Also I am using placeholders for the solvent names i.e. solvent1, solvent2 but the actual data file consists of real solvent names that can start with either letter or a number (e.g. hexane, 1-butanol, 1,3-dimethylbenzene, ch2cl2, n-methyl-2-pyrrolidinone). Also for simplicity I have added a new line between the two blocks in this example but in the actual file there are three lines of text that are not important and can be skipped
Property job 1 : Activity coefficients ln(gamma) ; Settings job 1 : T= 298.15 K ; x(6)= 1.0000 ; Units job 1 : Concentrations x : mole fraction ; Nr Compound ln(gamma) 1 F002 4.66656083 2 F011 26.13597035 3 F101 32.47411476 4 F11-1 29.58963453 5 F111 30.24092207 6 h2o 0.00000000 7 acetonitrile 2.14102090 8 chlorobenzene 8.72282917 9 chcl3 6.98143674 10 cyclohexane 10.20251798 11 1,2-dichloroethane 6.32324557 12 ch2cl2 5.50767091 13 1,2-dimethoxyethane 2.56706253 14 n,n-dimethylacetamide -1.64673734 Property job 2 : Activity coefficients ln(gamma) ; Settings job 2 : T= 298.15 K ; x(7)= 1.0000 ; Units job 2 : Concentrations x : mole fraction ; Nr Compound ln(gamma) 1 F002 1.69945785 2 F011 0.74578421 3 F101 2.67268035 4 F11-1 1.64808218 5 F111 1.95840198 6 h2o 2.08530828 7 acetonitrile 0.00000000 8 chlorobenzene 1.08379112 9 chcl3 0.46576330 10 cyclohexane 3.71606919 11 1,2-dichloroethane -0.02354847 12 ch2cl2 -0.23798262 13 1,2-dimethoxyethane 1.22044280 14 n,n-dimethylacetamide 0.44524110
In reply to Help with parsing a file by Odar
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |