ezyang/htmlpurifier
HTML Purifier is a robust HTML filtering library that prevents XSS using strict whitelists and aggressive parsing, producing standards-compliant output. Ideal for richly formatted, untrusted HTML with configurable tag and CSS support.
HTML Purifier is the go-to solution for securely sanitizing untrusted HTML—especially from rich-text editors—while preserving standards-compliant markup and blocking XSS. To begin:
Install:
composer require ezyang/htmlpurifier
Basic use in a controller:
use HTMLPurifier;
use HTMLPurifier_Config;
$purifier = new HTMLPurifier(HTMLPurifier_Config::createDefault());
$cleanHtml = $purifier->purify($request->input('body'));
First use case: Sanitize rich-text content (e.g., blog comments, article bodies) before storage or display.
Read first:
docs/README.html in the repo (or online) for architecturedocs/registry.html for understanding config/registry patternsLeverage HTML Purifier proactively across the data pipeline for consistent security:
Laravel binding: Register as a singleton in AppServiceProvider:
$this->app->singleton(HTMLPurifier::class, fn() => HTMLPurifier::instance(
HTMLPurifier_Config::createDefault()
));
FormRequest sanitization: Strip/fix malicious content before validation:
protected function prepareForValidation()
{
$this->merge([
'content' => app(HTMLPurifier::class)->purify($this->content),
]);
}
Blade helper: Create a @purify directive:
Blade::directive('purify', fn($expr) => "<?php echo app(HTMLPurifier::class)->purify($expr); ?>");
→ Use: @purify($article->excerpt)
Context-specific configs: Tailor rules per input type (e.g., comments vs. bio):
$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.Allowed', 'p,strong,a[href],ul,li'); // comments
$purifier = new HTMLPurifier($config);
Caching for performance:
$config->set('Cache.SerializerPath', storage_path('app/htmlpurifier'));
Avoid pitfalls and unlock advanced behavior with these hard-won insights:
Whitespace handling: By default, HTML Purifier preserves whitespace aggressively. Use Core.RemoveInvalidNode and avoid deprecated options like Core.RemovePreviewNode; prefer Output.NoScriptFallback for edge cases.
Protocol whitelist: javascript: and data: URIs are stripped by default. To allow safe schemes (e.g., https://), set URI.AllowedSchemes explicitly—but never allow javascript: or data: unless absolutely necessary.
Iframe safety (v4.19.0+): New URI.SafeIframeHosts option requires exact host matches (including www.). Configure like:
$config->set('URI.SafeIframeHosts', [
'youtube.com', 'www.youtube.com',
'player.vimeo.com'
]);
PHP 8.4/8.5 deprecations: Ensure ^4.19.0 to avoid preg_replace(null, ...) warnings. Verify composer.lock if issues persist post-upgrade.
Whitelist discipline: Overly broad HTML.Allowed rules (e.g., *) defeat the purpose. Start restrictive (e.g., p,a[href],strong,em,b) and expand only after threat modeling.
CSS validation gaps: Modern CSS (e.g., aspect-ratio, direction) requires CSS.AllowedProperties enabling—often omitted by default. Use HTMLPurifier_Config::loadIncludes() to inspect defaults.
How can I help you explore Laravel packages today?