comment on

Hello, I would like to ask a question about how to parse the file format given below. :

F001            1.2
F101            3.2
solvent1      0
solvent2     3

F001            2.2
F101            7.2
solvent1      5
solvent2     0
[download]

The file I would like to create a hash that looks like this:

solvent1_F001 => 1.2
solvent1_F101 => 3.2
solvent2_F001 => 2.2
solvent2_F101 => 7.2
[download]

The hash key becomes the solvent with value zero and underscore the string starting with F. The hash value is the number on the right of the string starting with F. I am new to Perl and programming and I am struggling to figure it out. I have tried to write the code below but can not figure out what the hash %solvent_face hash indices should be. I think I may not be using the right approach. Thank you in advance for the help.


my @temp;
my @face_ac;
my $size;
my %solvent_face;

open(my $fh,'<', $file) || die "Can not open file: $!";

while (my $row = <$fh>) {
  chomp $row;

if ($row=~/\d+ F/) {
    @temp=split(' ',$row);
    push @face_ac, @temp;

  }

  if ($row=~/\d+ [a-z]/) {
    @temp=split(' ',$row);
    if ($temp[2]==0) {
      $size=@face_ac

        for (my $i = 1; $i < $size+1; $i++) {

          $solvent_face{$temp[1]."_".$face_ac[???]}=$face_ac[???];

        }
       print "@temp\n";
    }

  }

}
close $fh;
[download]

Update Thank you all for providing help with this. Reading some of the replies made me realised I didn't do a good job at properly describing the file structure with the data hence this update. Sorry about that. The example data provided above is just a representative example rather that the actual full data. In the real file the number of strings starting with F can vary and is usually in the range of 3 to 10. The number of solvents can also vary in number up to ca. 60. Also I am using placeholders for the solvent names i.e. solvent1, solvent2 but the actual data file consists of real solvent names that can start with either letter or a number (e.g. hexane, 1-butanol, 1,3-dimethylbenzene, ch2cl2, n-methyl-2-pyrrolidinone). Also for simplicity I have added a new line between the two blocks in this example but in the actual file there are three lines of text that are not important and can be skipped

Update2 - Here is a proper representative example of the data structure showing 2 blocks only (the actual file has 61).

 
 Property  job 1 : Activity coefficients ln(gamma) ; 
 Settings  job 1 : T= 298.15 K ; x(6)= 1.0000 ;  
 Units     job 1 : Concentrations x : mole fraction ; 
  
  Nr Compound                                 ln(gamma) 
   1 F002                                    4.66656083 
   2 F011                                   26.13597035 
   3 F101                                   32.47411476 
   4 F11-1                                  29.58963453 
   5 F111                                   30.24092207 
   6 h2o                                     0.00000000 
   7 acetonitrile                            2.14102090 
   8 chlorobenzene                           8.72282917 
   9 chcl3                                   6.98143674 
  10 cyclohexane                            10.20251798 
  11 1,2-dichloroethane                      6.32324557 
  12 ch2cl2                                  5.50767091 
  13 1,2-dimethoxyethane                     2.56706253 
  14 n,n-dimethylacetamide                  -1.64673734 
 
 Property  job 2 : Activity coefficients ln(gamma) ; 
 Settings  job 2 : T= 298.15 K ; x(7)= 1.0000 ;  
 Units     job 2 : Concentrations x : mole fraction ; 
  
  Nr Compound                                 ln(gamma) 
   1 F002                                    1.69945785 
   2 F011                                    0.74578421 
   3 F101                                    2.67268035 
   4 F11-1                                   1.64808218 
   5 F111                                    1.95840198 
   6 h2o                                     2.08530828 
   7 acetonitrile                            0.00000000 
   8 chlorobenzene                           1.08379112 
   9 chcl3                                   0.46576330 
  10 cyclohexane                             3.71606919 
  11 1,2-dichloroethane                     -0.02354847 
  12 ch2cl2                                 -0.23798262 
  13 1,2-dimethoxyethane                     1.22044280 
  14 n,n-dimethylacetamide                   0.44524110
[download]

In reply to Help with parsing a file by Odar

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.