Parse HTML in JavaScript with DOMParser
Published on
I encountered a situation recently where I needed to perform transformations on a complicated HTML string in JavaScript. I also wanted to do this in vanilla JavaScript so that I did not have to install any dependencies.
Server-side JavaScript has several HTML parsing libraries available already,
such as cheerio
, htmlparser2
and jsdom
. For client-side JavaScript, there
is the native DOMParser
interface.
DOMParser
can parse an XML or HTML string into a DOM Document. All of the
standard methods, like querySelector
and getElementById
will work on an
instance of DOMParser
, making it a reasonable alternative to third-party
scripts.
Browser support is widespread, although only Internet Explorer 10 and higher support HTML string parsing.
Basic set up
Setting up DOMParser
involves instantiating a new instance and calling the
parseFromString
method, passing the HTML string and specifying text/html
as
the content type:
const html = `<p>HTML</p>`;
const parser = new DOMParser();
const parsed = parser.parseFromString(html, 'text/html');
parsed
will now act like the global document
variable, with the same
properties and methods available to it:
console.log(parsed.body.innerHTML); // returns "<p>HTML</p>".
console.log(parsed.body.innerText); // returns "HTML".
A working example
HTML parsed with DOMParser
can be modified and returned, making it handy
for tasks where DOM manipulation is required.
To demonstrate this, a class will be added to every <li>
element in the below
HTML snippet:
<ul class="list">
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
<li>Item 5</li>
</ul>
First, define the HTML snippet in JavaScript and load it via DOMParser
:
const html = `
<ul class="list">
<li>Item 1</li>
<li>Item 2</li>
<li>Item 3</li>
<li>Item 4</li>
<li>Item 5</li>
</ul>
`;
const parser = new DOMParser();
const parsed = parser.parseFromString(html, 'text/html');
Then use the querySelectorAll
method to fetch the <li>
elements in the list
and loop over each one to add a class
attribute:
const elements = parsed.querySelectorAll('.list li');
elements.forEach(el => {
el.setAttribute('class', 'list-item')
});
Once this is done, logging parsed.body.innerHTML
will return:
<ul class="list">
<li class="list-item">Item 1</li>
<li class="list-item">Item 2</li>
<li class="list-item">Item 3</li>
<li class="list-item">Item 4</li>
<li class="list-item">Item 5</li>
</ul>
If the transformed HTML will be added to the page, it can be done like so:
document.body.innerHTML = parsed.body.innerHTML;
Works cited
"DOMParser." MDN, Mozilla, 18 March 2019. https://developer.mozilla.org/en-US/docs/Web/API/DOMParser. Accessed 29 April 2019.
"Manipulating DOM Elements." Plain JavaScript - Manipulating DOM Elements, https://plainjs.com/javascript/manipulation/. Accessed 29 April 2019.
"The DOMParser interface." DOM Parsing and Serialization, W3C, 11 February 2019. https://w3c.github.io/DOM-Parsing/#the-domparser-interface. Accessed 29 April 2019.