RFC: My first GitHub contribution

Hi there everyone, I made my first contribution to an open source project today. Well, to be honest, the project was just a list of links and the contribution was just a script that most of you could have hacked together in 10 minutes... But I felt really good about it :D.

Why did I make this script?

I am a frequent visitor of the programmer discord server, which is a discord server for programmers (surprise, surprise). This server was started by a couple of programming enthusiasts over at reddit. It has grown out to a huge community of 5000+ programmers, with at least 1000 programmers concurrently online at any time of the day to discuss programming with and to get real time support from when you get stuck. One of the channels on this server is titled Collaborations-and-projects, and someone posted there that he had made a list of resources for people who want to learn about programming/computer science. The only problem was that now that he had made this list, it was going to be a huge pain to sort all of the resources by name in alphabetic order per category. To which my response was, of course, "THIS IS A TASK FOR PERL". So I asked for the link to his GitHub repo, put this script together, and let it run through all of the files in the repo, after which I committed all the changes.

links I would like to share

The link to join the programmer discord: https://discord.gg/9zT7NHP At the moment there is no Perl channel there yet, and when I asked why, they told me there was no interest!! So please, if you feel like it, join the server so that there is yet another place to discuss Perl in this world :)
The link to the GitHub repository I contributed to: https://github.com/TumblrCommunity/coding-masterpost

Now, as I am pretty new to Perl (and just started programming this year) and I'm sure that my code is pretty clunky because of that, I figured that it might be a good idea to ask for some comments on the first 30+ line program I have ever written in Perl. Are my comments on point for example? Should I comment more/less? And could my code have been shorter? Please feel free to criticize every single character and white space in my script! It won't hurt me (much).

First I'll give you an idea of the way the files are formatted; The <> tags are not in the real files!

# <CATEGORY HEADER>

[<name of resource>] (www.hyperlink.com) some text about the resource

[<name of resource>] (www.hyperlink.com) some text about the resource

[<name of resource>] (www.hyperlink.com) some text about the resource

#<HEADER OF NEXT CATEGORY>

[<name of resource>] (www.hyperlink.com) some text about the resource

[<name of resource>] (www.hyperlink.com) some text about the resource
[download]

Now my $code:

#!/usr/bin/env perl
# Sort all entrys in a list of links in this repo (https://github.com/
+TumblrCommunity/coding-masterpost/tree/PythonNerd-patch-1)
# Author: redrock9 (https://github.com/redrock9)
# This script modifies files. The modifications are permanent, but a b
+ackup with a ".backup" extention will
# be made in the directory in which the file is located before any mod
+ifications take place.
# How to use: execute this script with the path to a file as argument,
+ for example: ./sort.plx Resources/test.md
use strict;
use warnings;

use File::Copy qw(move);

# Take filehandle as agrument
my $file = $ARGV[0];

# Open file and put line sin an array
open (TEXT, "< $file") or die "Can't open $file for read: $!";
my @lines = <TEXT>;
map {s/\s+$//} @lines;
close TEXT or die "Cannot close $file: $!";

# get keys for each line and store each line in a hash
my @keys;
my %linesToSort;
my $i = -1; # Switches between categories (increments when a line star
+ting with # is encountered)
my $a = 1; # in this loop: number to append to keys that are otherwise
+ double in the hash

for (0..$#lines) {
    if ($lines[$_] =~ /^\#(?:.*)/){
        $i++
    }
    elsif ($lines[$_] =~ /(?:^\[)(.*?)(?:\])/){
        my($key) = $1;
        if ($linesToSort{$key}){
            $linesToSort{"${key}$a"} = "$lines[$_]";
            push @{$keys[$i]}, "${key}$a";
            $a++;
        }
        else {
            $linesToSort{$key} = "$lines[$_]";
            push @{$keys[$i]}, "$key";
        }
    }
}

# Print the number of lines in the file before modification
print "the number of lines before sorting is: $#lines\n";

# Sort keys lexicographically
map {@{$_} = sort @{$_}} @keys;

# Make a backup of the file
move("$file", "${file}.backup");

# Print the sorted lines into the file
$i = -1; # Switches between categories (increments when a line startin
+g with # is encountered)
$a = 0; # In this loop: The amount of lines per category (is set to 0 
+every time a # is encountered)

open (NEWTEXT, "> $file") or die "Can't open $file for read: $!";
for (0..$#lines) {
    if ($lines[$_] =~ /^\#(?:.*)/){
        print NEWTEXT "$lines[$_]\n";
        $i++;
        $a = 0;
    }
    elsif  ($lines[$_] !~ /(?:^\[)(.*?)(?:\])/){
        print NEWTEXT "$lines[$_]\n"
    }
    elsif($keys[$i][$a]){
        print NEWTEXT "$linesToSort{$keys[$i][$a]}\n";
        print "$keys[$i][$a]\n";
        $a++;
    }
}
close NEWTEXT or die "Cannot close file $file: $!";
print "-----SORTING COMPLETED-----\n";
[download]

Don't mind me, just another IT student wandering around the monastery in awe of its wonders.

Comment on RFC: My first GitHub contribution Select or Download Code

Replies are listed 'Best First'.
Re: RFC: My first GitHub contribution by Your Mother (Archbishop) on Mar 05, 2017 at 23:45 UTC
Here is something to write test data because that was more fun and I am not helpful (sometimes, not always). :P #!/usr/bin/env perl use strict; use warnings; use Data::Random qw(:all); use Getopt::Long; use URI; GetOptions( "categories=i" => \my $cats, "max-resources=i" => \my $max, "min-resources=i" => \my $min ) or die "Error in command line arguments\n"; $cats \|\|= 3; $max \|\|= 7; $min \|\|= 2; die "min-resources cannot be higher than max-resources" if $min > $max; for my $cat ( 1 .. $cats ) { printf "# <%s>\n\n", join " ", map uc, rand_words( min => 1, max = +> 8 ); for ( 1 .. $min + rand($max-$min+1) ) { my $name = join " ", map ucfirst, rand_words( min => 1, max => + 5 ); my @uri; push @uri, "www" if rand(1) > .5; push @uri, map lc, rand_words( min => 1, max => 3 ); push @uri, [qw/ com org co uk jp de nl /]->[rand 7] for 1 .. r +and(2); my @path = map lc, rand_words( min => 0, max => 3 ); my @query = map lc, rand_words( min => 0, max => 6 ); pop @query if @query % 2; my $uri = URI->new( join("/", join(".", @uri), @path) ); $uri->query_form(@query); my $desc = ucfirst join " ", map lc, rand_words( min => 3, max + => 9 ); printf "[%s] (%s) %s.\n\n", $name, $uri, $desc; } } [download] For actual feedback: POD (pod) is preferable to comments (usually, not always), lexical filehandles (open) are better than bareword, and real or at least mocked up test data is easier to work on so will elicit more helpful answers than mine… which in creating such might help elicit more helpful answers after all.	[reply] [d/l]

Replies are listed 'Best First'.

Re: RFC: My first GitHub contribution
by Your Mother (Archbishop) on Mar 05, 2017 at 23:45 UTC

Here is something to write test data because that was more fun and I am not helpful (sometimes, not always). :P

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Random qw(:all);
use Getopt::Long;
use URI;

GetOptions( "categories=i"    => \my $cats,
            "max-resources=i" => \my $max,
            "min-resources=i" => \my $min )
    or die "Error in command line arguments\n";

$cats ||= 3;
$max  ||= 7;
$min  ||= 2;
die "min-resources cannot be higher than max-resources"
    if $min > $max;

for my $cat ( 1 .. $cats )
{
    printf "# <%s>\n\n", join " ", map uc, rand_words( min => 1, max =
+> 8 );
    for ( 1 .. $min + rand($max-$min+1) )
    {
        my $name = join " ", map ucfirst, rand_words( min => 1, max =>
+ 5 );
        my @uri;
        push @uri, "www" if rand(1) > .5;
        push @uri, map lc, rand_words( min => 1, max => 3 );
        push @uri, [qw/ com org co uk jp de nl /]->[rand 7] for 1 .. r
+and(2);
        my @path = map lc, rand_words( min => 0, max => 3 );
        my @query = map lc, rand_words( min => 0, max => 6 );
        pop @query if @query % 2;
        my $uri = URI->new( join("/", join(".", @uri), @path) );
        $uri->query_form(@query);
        my $desc = ucfirst join " ", map lc, rand_words( min => 3, max
+ => 9 );
        printf "[%s] (%s) %s.\n\n", $name, $uri, $desc;
    }
}
[download]

For actual feedback: POD (pod) is preferable to comments (usually, not always), lexical filehandles (open) are better than bareword, and real or at least mocked up test data is easier to work on so will elicit more helpful answers than mine… which in creating such might help elicit more helpful answers after all.

[reply]
[d/l]