in reply to Parsing an html file

I don't see anything wrong, but I don't know the contents of your datafile. To find bugs in regexp like yours you could start with simpler regexp and work up with more complex iterations until you don't match anymore. In the difference you found your problem.

Also you should check if the file contents is really in $contenu by printing it after it was filled.

Replies are listed 'Best First'.
Re^2: Parsing an html file
by Matthieu14 (Initiate) on Apr 20, 2010 at 08:57 UTC
    Sorry. Here is the content of my html file which is stored in $contenu (This is the result of a print $contenu) :

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//FR"> <HTML lang="fr"> <HEAD> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1 +"> <LINK HREF="CollapsibleList.css" REL="stylesheet" TYPE="text/css"> <SCRIPT TYPE="text/javascript" SRC="CollapsibleList.js" ></SCRIPT> <link rel=stylesheet href=QimProcess.css> <TITLE>1-FIN01 - Facturer</TITLE> <META NAME="author" CONTENT="ADMIN"> </HEAD> <BODY> <A NAME="topofpagediagram12htm"></A> <div id="entete"> </div> <div id="menu_h"> <ul id="menu_horizontal"> <li> <a href="#">Index</a> <ul class="sous_menu_horizontal"> <li><a href="indexprocess.htm" + target="PAGE">Processus</a></li> <li><a href="indexdocument.htm +" target="PAGE">Documents</a></li> </ul> </li> <li> <a href="#">Aide</a> <ul class="sous_menu_horizontal"> <li><a href="diagram7.htm" tar +get="PAGE">LÚgende</a></li> </ul> </li> </ul> </div> <div id="menu_v"> <object classid="clsid:D27CDB6E-AE6D-11cf-96B8-4445535 +40000" id="menutree" width="100%" height="100%" codebase="http://fpdo +w nload.macromedia.com/get/flashplayer/current/swflash.cab"> <param name="movie" value="menutree.swf" /> <param name="quality" value="high" /> <param name="wmode" value="transparent" /> <param name="allowScriptAccess" value="sameDom +ain" /> <embed src="menutree.swf" quality="high" wmode +="transparent" width="100%" height="100%" name="menutree" align="midd +l e" play="true" loop="false" quality="high +" allowScriptAccess="sameDomain" type="application/x-shockwave-flash" p +luginspage="http://www.adobe.com/go/getflashplayer"> </embed> </object> </div> <div id="corps"> <a href="mailto:is.methodes@xxxxx.com?Subject=[REFERENTIEL] 1-FIN01 - + Facturer (12)">Send us a comment</a> | <a HREF="diagram6bca80b88e2 411dea2910019b93c8ff0.htm">Home</a><br>DerniÞre mise Ó jour effectuÚe +par Administrator (22.03.2010 10:54:59) <H1>1-FIN01 - Facturer</H1> <MAP NAME="COORDdiagram12htm"> <AREA SHAPE="RECT" COORDS="356, 171, 375, 190" HREF="document5.htm +" ALT="Mode OpÚratoire PrÚparer la facturation"> <AREA SHAPE="RECT" COORDS="95, 201, 189, 277" ALT="&#9556;vÚnement + interne Projet Ó facturer"> <AREA SHAPE="RECT" COORDS="95, 680, 189, 756" ALT="RÚsultat intern +e Projet facturÚ"> <AREA SHAPE="RECT" COORDS="267, 503, 361, 579" HREF="process63.htm +" ALT="N3 - ActivitÚ Suivre les opÚrations de crÚdit"> <AREA SHAPE="RECT" COORDS="463, 201, 557, 277" HREF="process54.htm +" ALT="N3 - ActivitÚ Valider les factures"> <AREA SHAPE="RECT" COORDS="267, 401, 361, 477" HREF="process55.htm +" ALT="N3 - ActivitÚ Recouvrer les factures"> <AREA SHAPE="RECT" COORDS="420, 76, 609, 756" HREF="organization18 +.htm" ALT="Fonction Directeur"> <AREA SHAPE="RECT" COORDS="267, 103, 361, 179" HREF="process52.htm +" ALT="N3 - ActivitÚ PrÚparer la facturation"> <AREA SHAPE="RECT" COORDS="267, 201, 361, 277" HREF="process53.htm +" ALT="N3 - ActivitÚ Etablir les factures"> <AREA SHAPE="RECT" COORDS="267, 299, 361, 375" HREF="process56.htm +" ALT="N3 - ActivitÚ Envoyer les factures"> <AREA SHAPE="RECT" COORDS="224, 76, 413, 756" HREF="organization4. +htm" ALT="Fonction Assistante de direction"> <AREA SHAPE="RECT" COORDS="95, 582, 189, 658" ALT="RÚsultat intern +e Factures payÚes"> <AREA SHAPE="RECT" COORDS="95, 401, 189, 477" ALT="&#9556;vÚnement + externe DÚlai de paiement atteint"> <AREA SHAPE="RECT" COORDS="95, 103, 189, 179" ALT="&#9556;vÚnement + externe Fin de mois"> <AREA SHAPE="RECT" COORDS="356, 171, 375, 190" HREF="document5.htm +" ALT="Mode OpÚratoire PrÚparer la facturation"> </MAP> <P><IMG SRC="diagram12.jpg" USEMAP="#COORDdiagram12htm" ALT="1-FIN01 - + Facturer" LONGDESC=""></P> <HR><P> </div> <BR><SMALL> CrÚÚ Ó partir du modÞle QimProcess le 19.04.2010 Ó 16:16</SMALL></P> </BODY> </HTML>
      There is no span-tag in this file, but is expected in your regexp. Naturally it won't match.
        Yes. Thanks a lot.

        In fact, i tried a lot of things and i thought my problem came from my syntax.
        I forgot this part (span) when generating the html test file before to parse it ;-)

        Thanks a lot for your reactivity.