regex is not working properly

manishrathi has asked for the wisdom of the Perl Monks concerning the following question:

<img class="left" alt="eway-logo" src="/images/global/logo-eway.gif"> </img>

When I create a new page "new", I want to replace alt="eway-logo" with alt="new-logo"

and src="/images/global/logo-eway.gif" with src="/images/global/new-eway.gif"

I have applied following logic fo that.

$file_content$u =~ s/logo-(.*)\.gif/logo-$cloneName\.gif/g ;

Code in the beginning is replaced with

generic-logo" src="/images/global/logo-generic

While I should get the code

<img class="left" alt="generic-logo" src="/images/global/generic-eway.gif"> </img>

I am not sure, why is it replacing

<img class="left" alt="when I have applied exact location, where it should start from.

Its also removing

.gif">

How can I get correct output ? Can someone please suggest whats wrong in this regex code ?

Thanks

Comment on regex is not working properly Select or Download Code

Replies are listed 'Best First'.
Re: regex is not working properly by wind (Priest) on Jun 24, 2011 at 08:33 UTC
Don't use greedy matching '.', use non-greedy instead '.?' Also, I would suggest that you learn how to use an actual HTML Parser instead of jury rigging regex's.	[reply]
Re: regex is not working properly by moritz (Cardinal) on Jun 24, 2011 at 08:41 UTC
The old wisdom "don't parse HTML with regexes" applies -- use a module instead that does the work for you: `#!/usr/bin/perl use strict; use warnings; use 5.010; use Mojo::DOM; my $dom = Mojo::DOM->new('<img class="left" alt="eway-logo" src="/imag +es/global/logo-eway.gif"> </img>'); $dom->at('img')->attrs({alt => 'new-logo', src => '/images/global/new- +eway.gif'}); say $dom->to_xml;` [download] Perl 6 - second systems done right	[reply] [d/l]
Re: regex is not working properly by bart (Canon) on Jun 24, 2011 at 08:58 UTC
`.` matches anything, including the quotes around your attributes. So, if you're sure the quotes are double quotes, you can better replace it with `[^"]`. You might also exclude slashes so you just match the basename. `s/(?<=[\/"]logo-)([^\/"])(?=\.gif")/$cloneName/` [download] A more generic approach, for example if you're not absolutely sure the HTML is formatted the way you want it, or if you parse more than just a snippet, might be to properly* parse the tag, and replace individual attributes. That parser can be done with regexes, the html validator on Perlmonks works this way. It's just replacing text in html, without any regard for in what kind of syntactic construct (inside tag, attribute, comment, or plain character data) it's in, that is not safe. But getting it to work using HTML::TokeParser::Simple just might be easier. You just pass the stuff you don't want to change, in a token parsing loop, using `$token->as_is()`.	[reply] [d/l] [select]