xemlock/htmlpurifier-html5
HTML5 definitions and tidy/sanitization rules for HTML Purifier, aligned with the WHATWG spec. Purify and normalize dirty HTML5 into valid output with an HTML5-ready config, plus flexible directives (e.g., safely allow YouTube iframes).
composer require xemlock/htmlpurifier-html5
use HTMLPurifier;
use HTMLPurifier_HTML5Config;
public function sanitizeHtml(string $dirtyHtml): string
{
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
return $purifier->purify($dirtyHtml);
}
<article> or <section>:
$cleanHtml = $this->sanitizeHtml($request->input('content'));
purify() method in a service class or form request to centralize sanitization logic.<figure> tags, datetime attributes) with PHPUnit:
$this->assertStringContainsString('<figure>', $purifier->purify('<figure><img src="x"></figure>'));
Create a dedicated service to encapsulate purification logic:
namespace App\Services;
use HTMLPurifier;
use HTMLPurifier_HTML5Config;
class HtmlSanitizer
{
protected $purifier;
public function __construct()
{
$config = HTMLPurifier_HTML5Config::create([
'HTML.Allowed' => 'article,section,header,footer,nav',
'URI.Safe' => '%^(https?://)?example\.com%'
]);
$this->purifier = new HTMLPurifier($config);
}
public function clean(string $html): string
{
return $this->purifier->purify($html);
}
}
Usage in controllers:
$sanitizer = app(HtmlSanitizer::class);
$safeHtml = $sanitizer->clean($request->post('content'));
Override defaults based on context (e.g., admin vs. user content):
// For admin posts (trusted content)
$adminConfig = HTMLPurifier_HTML5Config::createDefault();
$adminConfig->set('HTML.Trusted', true);
// For user comments (restricted)
$userConfig = HTMLPurifier_HTML5Config::createDefault();
$userConfig->set('HTML.Allowed', 'p,b,strong,a[href|title]');
Combine with Laravel’s validation to reject unsafe HTML before sanitization:
public function rules()
{
return [
'content' => [
'required',
function ($attribute, $value, $fail) {
$purifier = new HTMLPurifier(HTMLPurifier_HTML5Config::createDefault());
if ($purifier->purify($value) !== $value) {
$fail('HTML contains disallowed tags.');
}
}
]
];
}
Whitelist specific iframes dynamically:
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^https?://(www\.)?youtube\.com/embed/%');
Laravel Facades: Extend the Purifier facade to use HTML5Config:
// app/Providers/AppServiceProvider.php
use HTMLPurifier_HTML5Config;
public function boot()
{
Purifier::extend(function ($app) {
return new HTMLPurifier(HTMLPurifier_HTML5Config::createDefault());
});
}
Now use Purifier::clean($html) as usual.
Caching: Reuse purifier instances (they’re thread-safe):
$purifier = new HTMLPurifier(HTMLPurifier_HTML5Config::createDefault());
// Reuse $purifier across requests
Testing: Use HTMLPurifierTestCase (from the package) or mock the purifier:
$this->partialMock(HTMLPurifier::class, ['purify'])
->shouldReceive('purify')
->with($dirtyHtml)
->andReturn('<p>Cleaned</p>');
Broken <a> in v0.1.9:
0.1.9 incorrectly treated <a> as a block-level element. Always use >=0.1.10.composer require xemlock/htmlpurifier-html5:^0.1.10
Empty <figure> Removal:
<figure> tags are stripped by default (pre-HTML5 behavior).$config->set('HTML.Figure', true); // Explicitly enable
Form Security Risks:
HTML.Forms (default: false) allows phishing attacks via <form action="evil.com">.$config->set('HTML.Forms', true);
$config->set('HTML.Trusted', true); // Critical!
Regex Performance:
URI.Safe* regexes (e.g., SafeIframeRegexp) can slow down purification.$config->set('URI.SafeIframeRegexp', '%^https://trusted\.com/embed/%');
Attribute Conflicts:
datetime on <time>) may conflict with custom definitions.Attr.Allowed to explicitly permit attributes:
$config->set('Attr.Allowed', ['time:datetime']);
Inspect Sanitized Output: Compare dirty vs. clean HTML to identify stripped elements:
$dirty = '<div><script>alert(1)</script></div>';
$clean = $purifier->purify($dirty);
dd($dirty, $clean); // Check what was removed
Enable Debug Mode:
$config->set('Debug', true);
$config->set('Cache.SerializerPath', storage_path('app/htmlpurifier'));
Logs will appear in storage/logs/laravel.log.
Validate Config:
Use the HTMLPurifier_ConfigSchema to check for invalid directives:
$schema = HTMLPurifier_ConfigSchema::instance();
if (!$schema->validate($config->getAll())) {
throw new \InvalidArgumentException('Invalid config');
}
Custom Elements/Attributes: Extend the config dynamically:
$config->set('HTML.Allowed', 'article,section,my-custom-element');
$config->set('Attr.Allowed', ['my-custom-element:data-custom']);
Post-Processing:
Use Laravel’s Str::of() to manipulate sanitized output:
$cleanHtml = Str::of($purifier->purify($html))
->replace('class="old"', 'class="new"');
Event Listeners:
Hook into purification via Laravel events (e.g., illuminate.query for database sanitization):
Event::listen('illuminate.query', function ($query) {
if ($query->getQuery()->whereRaw) {
$query->whereRaw('sanitized_column = ?', [$purifier->purify($query->getBindings()['sanitized_column'])]);
}
});
Middleware: Sanitize input/output globally:
class SanitizeHtmlMiddleware
{
public function handle($request, Closure $next)
{
$request->merge([
'content' => $this->purifier->purify($request->input('content'))
]);
return $next($request);
}
}
How can I help you explore Laravel packages today?