punkish has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I am doing some customization of John Gruber's Markdown, and in the process, struggling with some regex work that hopefully you can give my some guidance on.

I want to collect all the local links in my text. So, in the example below, I want only

Perl Monk
Regular Expressions
The stuff in []() is not local, and the stuff in ![] is an image, hence I don't want them. I have not succeeded in making a good regex for what I need. Here is what I have...

my $text = "Hello, I am a [Perl Monk], still not good at [Regular Expressions]. I have been helped immensely by the good monks at [Perl Monks](http://www.perlmonks.org). This is what I look like ![mymugshot]. Thank you."; sub listlink { my ($text) = @_; my @links = ( $text =~ / (?<!\!\[) # make sure the match is not preceded by a ![ (?<=\[) # but is in fact preceded by just a [ .*? # the match (?=\]) # followed by a ] (?!\() # but not followed by a ( /xg ); foreach my $link (@links) { print "$link\n"; } }

Update: there could be anything inside [] and it would be valid. So, we could have [Some (stuff) !Wow] and it would be valid. The only links that are not local are [link text](link)

--

when small people start casting long shadows, it is time to go to bed

Replies are listed 'Best First'.
Re: regex to collect local links in Markdown
by GrandFather (Saint) on Oct 27, 2007 at 19:34 UTC

    Very nearly right. The assertion that '(' doesn't follow the link should be an assertion that '](' doesn't follow the link. Note that in the fixed code I've added a call to the sub so that the code runs as a complete sample:

    use strict; use warnings; my $text = "Hello, I am a [Perl Monk], still not good at [Regular Expressions]. I have been helped immensely by the good monks at [Perl Monks](http://www.perlmonks.org). This is what I look like ![mymugshot]. Thank you."; listlink ($text); sub listlink { my ($text) = @_; my @links = ( $text =~ / (?<!\!\[) # make sure the match is not preceded by a ![ (?<=\[) # but is in fact preceded by just a [ .*? # the match (?=\]) # followed by a ] (?!\]\() # but not followed by a ( /xg ); foreach my $link (@links) { print "$link\n"; } }

    Prints:

    Perl Monk Regular Expressions

    Perl is environmentally friendly - it saves trees
      Hi, am I missing something? I actually tried that earlier, and I get

      Perl Monk
      Regular Expressions
      Perl Monks](http://www.perlmonks.org). This is what I look like ![mymugshot
      
      --

      when small people start casting long shadows, it is time to go to bed

        Note that the final assertion has changed from (?! \( ) to (?! \]\( ).


        Perl is environmentally friendly - it saves trees
Re: regex to collect local links in Markdown
by graff (Chancellor) on Oct 27, 2007 at 18:03 UTC
    I think my first inclination for this sort of problem would be to start with something like
    @chunks = split /([\[\]])/;
    and build a "state-machine" style parser to go through the chunks in sequence.

    (Note that the split is capturing the square brackets, so that they show up as elements in the chunk list.) If your markdown has to do something special with adjacent square brackets, add a "+" just inside the close-paren of the split regex.

Re: regex to collect local links in Markdown
by andyford (Curate) on Oct 27, 2007 at 19:03 UTC