The problem is that i need to go through whole file as it contain html markup and that can contain translation strings anywhere in it. But you are right about splitting it on __ symbol, however issue arise if there is translation string inside translation string, i need to separate them somehow and make sure that translation string or anything within _(' .. ') doesnt contain __(' ... ') as well