Last updated on October 4th, 2022 at 01:57 pm

HTML is merely a subset of XML, so if you are aware of how to parse a xml file using php then this script will be easy to understand.

That being said, here we are going to parse a Webpage having simple HTML tags like table, tr etc.,

Let us name the HTML page as Simple_Webpage.html and add the below code.

<html>
       <body>
             <table><tr>
                          <td><b>Country</b></td>
                          <td><b>Temp {F}</b></td>
                          <td><b>Current Status</b></td>
                     </tr>
                     <tr>
                          <td>United States</td>
                          <td>74</td>
                          <td>Sunny</td>
                     </tr>
                     <tr>
                          <td>United Kingdom</td>
                          <td>65</td>
                          <td>Sunny</td>
                     </tr>
                      
                     <tr>
                          <td>India</td>
                          <td>94</td>
                          <td>Sunny</td>
                     </tr>
 
             </table>   
       </body>
 </html>         

Now the next step os to parse the above HTML page using the below php code. As you can see we are creating a new DOMDocument to represent the entire HTML that we are loading and then loaded the file we created above in to a variable named html.

<?php
  // new dom object
  $dom = new DOMDocument();
 
  //load the html
  $html = $dom->loadHTMLFile('Simple_Webpage.html');
 
  //discard white space
  $dom->preserveWhiteSpace = false;
 
  //the table by its tag name
  $tables = $dom->getElementsByTagName('table');
 
  //get all rows from the table
  $rows = $tables->item(0)->getElementsByTagName('tr');
 
  // loop over the table rows
  foreach ($rows as $row)
  {
   // get each column by tag name
      $cols = $row->getElementsByTagName('td');
   // echo the values 
      echo $cols->item(0)->nodeValue.'<br />';
      echo $cols->item(1)->nodeValue.'<br />';
      echo $cols->item(2)->nodeValue.'<hr>';;
    }
 
?>

Demo

You might also be interested in parsing XML using PHP, if that is the case feel free to refer this tutorial

Simple XML Parsing Using PHP (with demo)

One thought on “How to parse HTML using PHP”

Leave a Reply

Your email address will not be published. Required fields are marked *