Using DOMDocument in PHP to Process output HTML


PHP (PHP Hypertext Processor) is well known and very popular today. Almost more than 80% of alexa top 100 websites are based on PHP. PHP is a scripting language that is dynamic meaning that you don’t need to declare types (all variables are dynamic typed at runtime). However, PHP is kinda slower than other programming languages that may be used for similar purposes, such as Java.

PHP is rich in its APIs. It has supported lots of convenient functions so that we don’t need to re-invent the wheels. The regex expression is a powerful tool if we want to parse some patterns in text. For example, if you want to change all HTML hyperlinks, if there isn’t a target=_blank tag, make a default target=_blank, then by using regex isn’t that straightforward, and may not be a easy/flexible task.

Luckily, there is a DOMDocument class that we can use to process the string as the HTML DOM (Document Object Model). So the HTML string is parsed into a DOM tree (tags).

We use the following two lines to create the DOM object, the strictErrorChecking just ignores any HTML errors or unknown tags (such as HTML5 tag, e.g. google plus tag, g:plusone

1
2
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc = new DOMDocument();
$doc->strictErrorChecking = false;

Then we can load HTML string by using,

1
$doc->loadHTML($html);
$doc->loadHTML($html);

And we can get all the tags of hyper links (tag name is ‘a’),

1
$links = $doc->getElementsByTagName('a');
$links = $doc->getElementsByTagName('a');

Then we can process the array of tags of interests.

1
2
3
4
5
6
7
8
foreach ($links as $item) {
      if ($item->hasAttribute('target')) {
            $target = $item->getAttribute('target');
            if (strcmp($target, '_blank') !== 0) {
                $item->setAttribute('target', '_blank'));
            }
      }
}
foreach ($links as $item) {
      if ($item->hasAttribute('target')) {
            $target = $item->getAttribute('target');
            if (strcmp($target, '_blank') !== 0) {
                $item->setAttribute('target', '_blank'));
            }
      }
}

The methods, hasAttribute, getAttribute, and setAttribute are self-explanatory. With this kind of method, it is easier to process other HTML tags, for example, if you want to make sure all images are accompanied with title and alt tags in order to improve the page SEO (Search Engine Optimisation) score.

You could create a function and use this before PHP outputs to browser, for example,

1
2
3
4
5
6
7
8
9
10
11
ob_start('ob_gzhandler'); // gzip compression
ob_start('process');
 
function process($html) {
 // process your DOM
 //
 // ... ...
 
 // return processed HTML
 return $html;
}
ob_start('ob_gzhandler'); // gzip compression
ob_start('process');

function process($html) {
 // process your DOM
 //
 // ... ...

 // return processed HTML
 return $html;
}

–EOF (The Ultimate Computing & Technology Blog) —

GD Star Rating
loading...
461 words
Last Post: Processing Example - Draw a ChessBoard
Next Post: GD Star Rating Plugin Not Properly Working if CloudFlare is On

The Permanent URL is: Using DOMDocument in PHP to Process output HTML

Leave a Reply