Hello, I would like to ask a question about how to parse the file format given below. :

F001 1.2 F101 3.2 solvent1 0 solvent2 3 F001 2.2 F101 7.2 solvent1 5 solvent2 0

The file I would like to create a hash that looks like this:

solvent1_F001 => 1.2 solvent1_F101 => 3.2 solvent2_F001 => 2.2 solvent2_F101 => 7.2

The hash key becomes the solvent with value zero and underscore the string starting with F. The hash value is the number on the right of the string starting with F. I am new to Perl and programming and I am struggling to figure it out. I have tried to write the code below but can not figure out what the hash %solvent_face hash indices should be. I think I may not be using the right approach. Thank you in advance for the help.

my @temp; my @face_ac; my $size; my %solvent_face; open(my $fh,'<', $file) || die "Can not open file: $!"; while (my $row = <$fh>) { chomp $row; if ($row=~/\d+ F/) { @temp=split(' ',$row); push @face_ac, @temp; } if ($row=~/\d+ [a-z]/) { @temp=split(' ',$row); if ($temp[2]==0) { $size=@face_ac for (my $i = 1; $i < $size+1; $i++) { $solvent_face{$temp[1]."_".$face_ac[???]}=$face_ac[???]; } print "@temp\n"; } } } close $fh;

Update Thank you all for providing help with this. Reading some of the replies made me realised I didn't do a good job at properly describing the file structure with the data hence this update. Sorry about that. The example data provided above is just a representative example rather that the actual full data. In the real file the number of strings starting with F can vary and is usually in the range of 3 to 10. The number of solvents can also vary in number up to ca. 60. Also I am using placeholders for the solvent names i.e. solvent1, solvent2 but the actual data file consists of real solvent names that can start with either letter or a number (e.g. hexane, 1-butanol, 1,3-dimethylbenzene, ch2cl2, n-methyl-2-pyrrolidinone). Also for simplicity I have added a new line between the two blocks in this example but in the actual file there are three lines of text that are not important and can be skipped

Update2 - Here is a proper representative example of the data structure showing 2 blocks only (the actual file has 61).
Property job 1 : Activity coefficients ln(gamma) ; Settings job 1 : T= 298.15 K ; x(6)= 1.0000 ; Units job 1 : Concentrations x : mole fraction ; Nr Compound ln(gamma) 1 F002 4.66656083 2 F011 26.13597035 3 F101 32.47411476 4 F11-1 29.58963453 5 F111 30.24092207 6 h2o 0.00000000 7 acetonitrile 2.14102090 8 chlorobenzene 8.72282917 9 chcl3 6.98143674 10 cyclohexane 10.20251798 11 1,2-dichloroethane 6.32324557 12 ch2cl2 5.50767091 13 1,2-dimethoxyethane 2.56706253 14 n,n-dimethylacetamide -1.64673734 Property job 2 : Activity coefficients ln(gamma) ; Settings job 2 : T= 298.15 K ; x(7)= 1.0000 ; Units job 2 : Concentrations x : mole fraction ; Nr Compound ln(gamma) 1 F002 1.69945785 2 F011 0.74578421 3 F101 2.67268035 4 F11-1 1.64808218 5 F111 1.95840198 6 h2o 2.08530828 7 acetonitrile 0.00000000 8 chlorobenzene 1.08379112 9 chcl3 0.46576330 10 cyclohexane 3.71606919 11 1,2-dichloroethane -0.02354847 12 ch2cl2 -0.23798262 13 1,2-dimethoxyethane 1.22044280 14 n,n-dimethylacetamide 0.44524110

In reply to Help with parsing a file by Odar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.