xemlock/htmlpurifier-html5
HTML5 definitions and tidy/sanitization rules for HTML Purifier, aligned with the WHATWG spec. Purify and normalize dirty HTML5 into valid output with an HTML5-ready config, plus flexible directives (e.g., safely allow YouTube iframes).
This library provides HTML5 element definitions for HTML Purifier, compliant with the WHATWG spec.
It is the most complete HTML5-compliant solution among all based on HTML Purifier. Apart from providing the most extensive set of element definitions, it provides tidy/sanitization rules for transforming the input into a valid HTML5 output.
Install with Composer by running the following command:
composer require xemlock/htmlpurifier-html5
The most basic usage is similar to the original HTML Purifier. Create a HTML5-compatible config
using HTMLPurifier_HTML5Config::createDefault() factory method, and then pass it to an HTMLPurifier instance:
$config = HTMLPurifier_HTML5Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html5 = $purifier->purify($dirty_html5);
To modify the config you can either instantiate the config with a configuration array passed to
HTMLPurifier_HTML5Config::create(), or by calling set method on an already existing config instance.
For example, to allow IFRAMEs with Youtube videos you can do the following:
$config = HTMLPurifier_HTML5Config::create(array(
'HTML.SafeIframe' => true,
'URI.SafeIframeRegexp' => '%^//www\.youtube\.com/embed/%',
));
or equivalently:
$config = HTMLPurifier_HTML5Config::createDefault();
$config->set('HTML.SafeIframe', true);
$config->set('URI.SafeIframeRegexp', '%^//www\.youtube\.com/embed/%');
Apart from HTML Purifier's built-in configuration directives, the following new directives are also supported:
Attr.AllowedInputTypes
Version added: 0.1.12
Type: Lookup (or null)
Default: null
List of allowed input types, chosen from the types defined in the spec. By default, the setting is null, meaning there is no restriction on allowed types. Empty array means that no explicit type attributes are allowed, effectively making all inputs a text inputs.
HTML.Forms
Version added: 0.1.12
Type: Boolean
Default: false
Whether or not to permit form elements in the user input, regardless of %HTML.Trusted value. Please be very careful when using this functionality, as enabling forms in untrusted documents may allow for phishing attacks.
HTML.IframeAllowFullscreen
Version added: 0.1.11
Type: Boolean
Default: false
Whether or not to permit allowfullscreen attribute on iframe tags. It requires either
%HTML.SafeIframe or
%HTML.Trusted to be true.
HTML.Link
Version added: 0.1.12
Type: Boolean
Default: false
Permit the link tags in the user input, regardless of
%HTML.Trusted value.
This effectively allows link tags without allowing other untrusted elements.
If enabled, URIs in link tags will not be matched against a whitelist specified
in %URI.SafeLinkRegexp (unless %HTML.SafeIframe is also enabled).
HTML.SafeLink
Version added: 0.1.12
Type: Boolean
Default: false
Whether to permit link tags in untrusted documents. This directive must
be accompanied by a whitelist of permitted URIs via %URI.SafeLinkRegexp,
otherwise no link tags will be allowed.
HTML.XHTML
Version added: 0.1.12
Type: Boolean
Default: false
While deprecated in HTML 4.01 / XHTML 1.0 context, in HTML5 it's used for enabling support for namespaced attributes and XML self-closing tags.
When enabled it causes xml:lang attribute to take precedence over lang,
when both attributes are present on the same element.
URI.SafeLinkRegexp
Version added: 0.1.12
Type: String
Default: null
A PCRE regular expression that will be matched against a <link> URI. This directive
only has an effect if %HTML.SafeLink is enabled. Here are some example values:
%^https?://localhost/% - Allow localhost URIs
Use Attr.AllowedRel to control permitted link relationship types.
Aside from HTML elements supported originally by HTML Purifier, this library adds support for the following HTML5 elements:
<article>, <aside>, <audio>, <bdi>, <data>, <details>, <dialog>, <figcaption>, <figure>, <footer>, <header>, <hgroup>, <main>, <mark>, <nav>, <picture>, <progress>, <section>, <source>, <summary>, <time>, <track>, <video>, <wbr>
as well as HTML5 attributes added to existing HTML elements, such as:
<a>, <del>, <fieldset>, <ins>, <script>
The MIT License (MIT). See the LICENSE file.
How can I help you explore Laravel packages today?