shabird has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks! I have a file and i want to store it in a 2d array and extract the information while separating it by semicolon. So the file contains three proteins, with their protein names stored in the column ProteinName. It also includes three of their molecular functions, stored in the columns MF1, MF2 and MF3. The data is tab separated in the file. The content from the file should be read into to the program and stored in a 2D array. From this 2D array, I want to print out the concatenated string of each protein and its molecular functions separated with a semicolon. Here is the file
ProteinName MF1 MF2 MF3 GH1 Growth factor activity Growth hormone receptor binding Ho +rmone activity POMC G protein-coupled receptor binding Hormone activity Sign +aling receptor binding THRAP3 ATP binding Source Nuclear receptor transcription coactiv +ator activity Phosphoprotein binding
I want the output like this
GH1; Growth factor activity; Growth hormone receptor binding; Hormone + activity POMC; G protein-coupled receptor binding; Hormone activity; Signaling +receptor binding THRAP3; ATP binding Source; Nuclear receptor transcription coactivator + activity; Phosphoprotein binding
Here is my code for this approach
#!/usr/bin/perl -w use strict; open(FH, "/Users/Desktop/Gene.txt") or die; my @content = (<FH>); close(FH); my @myArray; for my $row (@content) { my @columns = split ' ', $row; push @myArray, \@columns; } my $title_row = shift @myArray; for my $row (@myArray) { my $sum = 0; for my $col (1 .. $#$row) { $sum += $row->[$col]; } print "$row->[0] is $sum\n"; }
But it gives me this error
Argument "Growth" isn't numeric in addition (+) at task3.pl line 31. Argument "factor" isn't numeric in addition (+) at task3.pl line 31. Argument "activity" isn't numeric in addition (+) at task3.pl line 31. Argument "Growth" isn't numeric in addition (+) at task3.pl line 31.
Please help me in this context
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: storing a file in 2d array
by hippo (Archbishop) on May 01, 2020 at 08:36 UTC | |
If I had a TSV file to read into an AoA I would let a module do the heavy work:
| [reply] [d/l] |
Re: storing a file in 2d array
by bliako (Abbot) on May 03, 2020 at 08:49 UTC | |
There may be a Perl interface for fetching and processing your data, if it's from a well known source, e.g. Ensembl, Entrez, KEGG, *Prot, etc. For example: The usage can be as simple as this:
bw, bliako | [reply] [d/l] |
Re: storing a file in 2d array
by kcott (Archbishop) on May 02, 2020 at 16:56 UTC | |
G'day shabird, I notice most, if not all, of your posts relate to biological data which, as I'm sure you're aware, can be huge (often measured in gigabytes). I have also noticed that, in many cases, you've read entire file contents into a variable and then subsequently processed that variable's data; e.g.
I would recommend you look for ways to process the data as you read it from your input file. This will be more efficient and will use substantially less memory. It's not always possible to do this but in many cases it is. Where you can't do this, consider only storing a subset of the input data: you often won't need every piece of information for the task at hand. For your current task, I would recommend Text::CSV for reading the input; if you also have Text::CSV_XS installed it will run faster. I've included two ways to do this: one with the 2D array you say you want; and one without that intermediary data structure (as I discussed above). You've described the first part of your task well; however, the second part, with the counts, is a little sketchy. I've made two guesses regarding the counting: I don't know if either is what you want but you may, at least, get some ideas from them. I copied your sample input from the [download] link (thanks for providing that). As I see some discussion, in a number of responses, regarding whether tabs are correctly represented, I've added &show_verbatim_input so you can see exactly what I'm working with. Here's the code:
Here's the output:
— Ken | [reply] [d/l] [select] |
Re: storing a file in 2d array
by jo37 (Curate) on May 01, 2020 at 08:30 UTC | |
Issues with your program: Without knowing what kind of sum you want, I cannot help at this point. As far as you described the task, this would do:
Greetings, | [reply] [d/l] [select] |
Re: storing a file in 2d array
by AnomalousMonk (Archbishop) on May 01, 2020 at 09:57 UTC | |
I, also, am confused about what is supposed to be summed while processing the file (I can't see anything numeric in your sample data). (Update: I also don't understand what you want to do with a 2D array, or why.) However, this code will produce exactly the output you specify from the given input. (Caution: The tabs that are supposed to be in the __DATA__ section may not survive the posting process. Check and restore them as needed.)
Update: I just round-tripped the code posted above and it looks like the tabs in the __DATA__ section survived intact! Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
by jo37 (Curate) on May 01, 2020 at 11:35 UTC | |
(Caution: The tabs that are supposed to be in the __DATA__ section may not survive the posting process. Check and restore them as needed.) Seems to depend on how you copy the text. When using the download link, everything looks fine. As I copied the data from the OP without altering anything, I didn't even think about a possible issue with tabs. Greetings, | [reply] |
Re: storing a file in 2d array
by rnewsham (Curate) on May 01, 2020 at 08:07 UTC | |
If your data is tab separated you should split on a \t. If you want your sum loop to count the number of elements in a row a better way would be
If you want to print ';' separated you can just use a join. Although if you want this as input to some other program it may be safer to look at something like Text::CSV and set semicolon as the sep_char
Couple of best practice notes; use warnings is preferred over -w and use 3 argument open. Putting it all together, not sure it is exactly what you want but should help get you there.
| [reply] [d/l] [select] |
by AnomalousMonk (Archbishop) on May 01, 2020 at 10:15 UTC | |
... count the number of elements in a row ...
To accumulate a sum of the number of elements in a referenced array, a better way IMHO would be Give a man a fish: <%-{-{-{-< | [reply] [d/l] [select] |
by shabird (Sexton) on May 01, 2020 at 13:28 UTC | |
Works! and it does what i want thank you :) | [reply] |
Re: storing a file in 2d array
by johngg (Canon) on May 01, 2020 at 10:12 UTC | |
It might be that the fields are TAB separated but I see no evidence for that in the page source, just multiple SPACE characters. If that reflects reality then this code might do the trick.
I hope this is helpful. Update: I should have looked at the download link, they are TABs, so changing m{\s{2,}} to m{\t} would work. Cheers, JohnGG | [reply] [d/l] [select] |
Re: storing a file in 2d array
by clueless newbie (Curate) on May 01, 2020 at 19:45 UTC | |
which (at least for me on Windows 10) yields
| [reply] [d/l] [select] |
Re: storing a file in 2d array
by clueless newbie (Curate) on May 03, 2020 at 16:15 UTC | |
shabird has a number of posts (ead a file which has three columns and store the content in a hash, Query of multi dimentional array, storing a file in 2d array) that are somewhat similar. Hence "script.pl" which makes use of DBI, DBD::CSV, Getopt::Long::Descriptive, and Text::Table
Yes, I'm guilty of heresy - I confess I'm on Windows.
Let us assume that we have stored the data from the nodes as x<node number>.txt in the local directory, we have "x11114659.txt", "x11115466.txt" and "x11116298.txt" so for ead a file which has three columns and store the content in a hash:
or as a nice table (when the select returns more than one field ... we get a table)
For Query of multi dimentional array:
or again as table
And finally for storing a file in 2d array:
Now the count function doesn't seem to be behaving itself so this "select regulation, count(*) from x11115466 group by regulation" throws an error. But there's a simple work-around. Supply a module that exports two subs "with_each_row" and "in_summary" - "with_each_row" is fed the reference to a hash of field names and their values, and "in_summary" is called once the select is exhausted.
Fortunately, for us, there is no need to change any code in script.pl ... we simply make use ot the -M option and get
| [reply] [d/l] [select] |
Re: storing a file in 2d array
by perlfan (Parson) on May 12, 2020 at 03:31 UTC | |
| [reply] |