Parsing and manipulating HTML efficiently is a crucial task that developers face regularly. With PHP 8.4’s introduction of native HTML5 support and WordPress 6.2’s new HTML Tag Processor, let’s dive into a performance between these two approaches.
Background
WordPress 6.2 introduced the WP_HTML_Tag_Processor
as a solution to a long-standing issue: the lack of reliable HTML5 parsing. Many developers were resorting to regex-based solutions (we’ve all read the famous Stack Overflow answers: “Don’t parse HTML with regex!”).
PHP 8.4 brought native HTML5 support to its DOM implementation.
The Test
I ran a benchmark comparing both approaches, processing a sample HTML structure 100,000 times.
The test included operations to select a special
class by using query selector main > article:last-of-type
Here’s the sample HTML used in the test:
<main><article>First Article</article><article class="featured">Second Article</article><article class="featured special">Third Article</article><div class="container"><article class="nested featured">Nested Article</article></div></main>
Benchmarks:
Implementation | Total Time (s) | Avg Time per Operation (ms) |
---|---|---|
PHP 8.4 DOM | 0.6825 | 0.0068 |
WP_HTML_Tag_Processor | 2.8700 | 0.0287 |
Finding: PHP 8.4’s DOM implementation is approximately ~76.22% faster than WordPress’s HTML Tag Processor!
What This Means
While WordPress’s HTML Tag Processor serves its purpose well, especially in maintaining backward compatibility, PHP 8.4’s native DOM implementation shows impressive performance gains. The 76% performance difference suggests that WordPress could benefit from adopting native DOM operations when available while maintaining its current implementation for backward compatibility.
For developers starting new projects with PHP 8.4+, the choice is hard until the WP core update to use the new DOM API for PHP 8.4. However, WordPress engineers need to balance performance with compatibility requirements.
Benchmark code
Here is the code used for this benchmark:
Leave a Reply