in reply to Re^2: create array of empty files and then match filenames
in thread create array of empty files and then match filenames

Hi! I tried the "sort" function for a different thing I wanted to do, but it doesn't work as expected.
my @confs = (); push (@confs, $confs); my @sortedconfs; @sortedconfs = sort {$a cmp $b} @confs; print @sortedconfs; print "\n";

So I put all my conformations ($confs) in an array (@confs) and then I want to sort them numerically. However the output I get is only one file, the biggest number one, not a row of numbers sorted numerically. What do you think might be going wrong?

I'm also concerned by the fact that if I do "sort {$b cmp $a}" I get the same result as with {$a cmp $b}, shouldn't I get the reversed order? The variable $confs is my numbered chemical conformations, which is always 4 characters long in the style of B001, B002, B003 and so on. This variable is in my filenames along with other similar ones (similar to a filename like MOLEC1-B001-OPT-FREQ2), so I want to sort my files, according to their chemical conformations, numerically. But if I do "sort {$a <=> $b}" it complains that "it's not a numerical value", that's why I tried with cmp as seen here http://www.perlmonks.org/?node_id=259465. Any thoughts?

edit: I read a bit more and found that in mixed strings I should extract the numerical value and sort the substring, so I changed the previous "sort" line with "{ substr($a, 1) <=> substr($b, 1) }" but I get the same output :(

Replies are listed 'Best First'.
Re^4: create array of empty files and then match filenames
by kcott (Archbishop) on Jan 15, 2016 at 00:12 UTC

    The short answer is, with the code you've posted, you only populate @confs with one element (i.e. $confs) so there is nothing to sort!

    When posting a question, you need to show us:

    • Enough of your code such that we can reproduce the problem.
    • Your input data (which should be a small, representative sample).
    • Your actual output.
    • Your expected output.
    • All error or warning messages you receive.

    Input and output (including any messages) needs to be shown to us exactly as you see it, i.e. not prosaic descriptions of the same. All of this, and more, is explained in the "How do I post a question effectively?" guidelines: please read them.

    In Re^4: create array of empty files and then match filenames, poj asked "What is in $confs?". You didn't really answer this question. Is it a series of Bnnn substrings, a series of filenames, or something else? A simple:

    my $confs = '...';

    in your code would have probably told us all we need to know.

    From what you've posted here, and elsewhere in this thread, I'm wondering whether (as part of your learning) you haven't really understood how to populate arrays. Consider these two lines of code:

    my @array1 = (1, 2, 3); my @array2 = '1, 2, 3';

    @array1 contains three elements: the digits 1, 2 and 3. @array2 contains one element: the string '1, 2, 3'.

    This also works the same for push which you seem to have used (probably incorrectly) in a number of places. Consider this code which results in @array1 and @array2 being populated exactly the same as in the last example:

    my (@array1, @array2); my $elements_for_array = '1, 2, 3'; push @array1, 1, 2, 3; # @array1 has three elements push @array2, $elements_for_array; # @array2 has one element

    The "push @array2, $elements_for_array;" example seems to be what you've done a few times and maybe has caused you problems. I'm really just guessing but perhaps that helps. See also "perlintro: Perl variable types" and "perldata: List value constructors".

    In an attempt to get you back on track with your code, here's my guess as to the type of thing you need:

    #!/usr/bin/env perl -l use strict; use warnings; my @filenames = qw{ MOLEC8-B040-OPT-FREQ2.gout MOLEC1-B001-OPT-FREQ2.gout MOLEC2-B010-OPT-FREQ2.gout MOLEC10-B002-OPT-FREQ2.gout }; my $re = qr{^[^-]+-([^-]+)}; my @sorted = sort { ($b =~ /$re/)[0] cmp ($a =~ /$re/)[0] } @filenames +; print 'Original filenames:'; print for @filenames; print 'Sorted filenames:'; print for @sorted;

    Possibly the only tricky bits in that code are they parts like "($x =~ /$re/)[0]". $re contains one capture, i.e. ([^-]+), which gets the Bnnn part of each filename. In list context, the regex match evaluates to a list of the captures; as there's only one, ($x =~ /$re/) evaluates to something like ('Bnnn'). The zeroth index into that list evaluates to 'Bnnn' which provides a string for the cmp operation. [Also note that the regex I've used allows MOLECn to go above MOLEC9; the MOLEC10 example successfully tests this.]

    Here's the output from running that script:

    Original filenames: MOLEC8-B040-OPT-FREQ2.gout MOLEC1-B001-OPT-FREQ2.gout MOLEC2-B010-OPT-FREQ2.gout MOLEC10-B002-OPT-FREQ2.gout Sorted filenames: MOLEC8-B040-OPT-FREQ2.gout MOLEC2-B010-OPT-FREQ2.gout MOLEC10-B002-OPT-FREQ2.gout MOLEC1-B001-OPT-FREQ2.gout

    — Ken

Re^4: create array of empty files and then match filenames
by poj (Abbot) on Jan 14, 2016 at 15:52 UTC

    What is in $confs ?

    I'm guessing you need to use split to create @confs

    #!perl use strict; my $confs = 'B001, B003, B002'; my @confs = split /, /,$confs; my @sortedconfs = sort {$a cmp $b} @confs; print "@sortedconfs\n";
    poj

      Hi, thanks for the reply! So, the variable $confs is my numbered chemical conformations, in the form of 4 characters long in the style of B001, B002, B003 and so on. They are in my filenames along with other similar ones. The result is something like MOLEC1-B001-OPT-FREQ2.gout or MOLEC2-B010-OPT-FREQ2.gout or MOLEC8-B040-OPT-FREQ2.gout and so on. Does that make sense? I don't know how many conformations I'll get in the end, as I'm optimising my molecules now. I don't know if I've managed to explain it at all... In my case I don't think split would be useful, on the other hand I'm learning so I may be wrong.

      edit: You're right that the array @confs isn't populated correctly. I print its contents and there's only one file that it finds, when I have 5 of them in the directory. So I have to find a way to put all elements in the array and not only one.