many thanks to all of you for your fast replies and for all this great advice!
My HTML is indeed very "lazy", meaning that a lot of things that aren't 100 % necessary seem to have been omitted.
There are about 10,000 *.html files containing company info, each with a few sections like the one shown, plus various other stuff in them as well. I'm going to be happy if at the end, I can manage to extract (company name + telephone number + fax number + street number + street + ZIP code + city name) into a CSV file, everything separated by commas.
Now I'm going to test a bit, helped by the wonderful input from your side, many thanks!!
In reply to Re: How to parse not closed HTML tags that don't have any attributes?
by Rantanplan
in thread How to parse not closed HTML tags that don't have any attributes?
by Rantanplan
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |