How to build a Content Diff View in vanilla JavaScript

Krishna Prasad
JavaScript in Plain English
5 min readAug 10, 2020

--

A diff’d version of HTML

Hello y’all, let’s talk about “Diff View” today and see how we can achieve that in our project. You can get the JS file from tnwinc’s repo here. It is a CoffeeScript code though, but you can use any online converter to get JS code out of it — I used JS2.Coffee. OR you could just grab a copy of my file from this GitHub Gist

So to start things off, here’s a little more information on what we are trying to achieve today.

Scenario

Let’s say we used an algorithm to generate a document in the backend and provided it to the frontend where we use a WYSIWYG editor to allow the user to modify the content based on their requirement. Now, a few years later they want to know what they changed (mostly improbable, yes, but you get the point). This is where they would expect to see the difference between two documents.

Gotchas

  1. This is pure vanilla JavaScript and does not have any dependencies — meaning you can take the file and slap it on to any project you want, load it up and BOOM! you’re good to go.
  2. The beauty of this is that it can be implemented both in frontend and backend code — that is totally on you and your requirement. For instance, in my case there were a lot of parameters where the calculation would change and the document would be updated on the backend and it was a backend intensive task so we implemented it on the backend. If your requirement is mostly static with just a bunch of HTML coming in from the BE, you can choose to implement it on the FE itself.
  3. Certain parts of the implementation depends on CSS3 (nothing major but more like playing with visibility to create an empty line feel.
  4. The CSS can be implemented in 2 ways (as far as I have seen) —
    a. When you use an editor like TinyMCE, it renders everything within an iFrame which means they have an option where you can plug in your custom CSS right into the editor that is relevant only to that editor. This is a nice little feature where you can expect both the document and it’s CSS to be sent from the backend.
    b. Using an editor like draft.js where you will need to provide a wrapper class and ensure that your custom CSS (applied in your project now) will take effect only on the desired editor.

Requirements

  1. Any backend — NodeJS, JAVA, RoR — it’s up to you.
  2. Any frontend — Angular, React, Vue or even pure JS — again, up to you.
  3. A copy of the htmldiff.js file. You can keep that in your project directory or make the changes and host it on your CDN.

Summary

We will be taking an HTML string and break, tear and rip; but mostly split it into characters and compare it with its counterpart to show the difference.

With all that done, let’s dig into the code. Before we head into the JS, let us quickly create the layout and adjust the styles accordingly.

THE HTML<div class="card">
<div class="row">
<div class="col">
<h4>Document with changes</h4>
<div class="card current-document" id="output"></div>
</div>
<div class="col">
<h4>Original Document</h4>
<div class="card system-generated" id="outputNew"></div>
</div>
</div>
</div>

This is your layout and I will let you style this the way it suits your project.

The creator has done a good job at naming the functions and variables so they’re mostly self explanatory, though I will walk you through the major pieces there —

is_end_of_tag = function (char) {
return char === ">";
};
is_start_of_tag = function (char) {
return char === "<";
};
is_whitespace = function (char) {
return /^\s+$/.test(char);
};
is_tag = function (token) {
return /^\s*<[^>]+>\s*$/.test(token);
};
isnt_tag = function (token) {
return !is_tag(token);
};

These functions are used to determine the character “type” — this is very useful to know when and where to start wrapping your difference in either the <ins> / <del> tags.

The html_to_tokens function is where the string butchering happens and the characters are matched.

The “change” happens in the part where the matched strings are mapped and added / removed — the op_map object.

op_map = {
equal: function (op, before_tokens, after_tokens) {
return before_tokens
.slice(op.start_in_before, +op.end_in_before + 1 || 9e9)
.join("");
},
insert: function (op, before_tokens, after_tokens) {
var val;
val = after_tokens.slice(op.start_in_after, +op.end_in_after + 1 || 9e9);
return wrap("ins", val);
},
delete: function (op, before_tokens, after_tokens) {
var val;
val = before_tokens.slice(op.start_in_before, +op.end_in_before + 1 || 9e9);
return wrap("del", val);
}
};
op_map.replace = function (op, before_tokens, after_tokens) {
return (
op_map.delete(op, before_tokens, after_tokens) +
op_map.insert(op, before_tokens, after_tokens)

);
};

If you notice here, depending on the changes in the op_tokens, we will either wrap it around an <ins> tag or the <del> tag — indicating that it is wither Inserted or Deleted. Here, the return statement returns a concatenation of “DELETE” string + “INSERT” string. However, here’s the catch —
1. You can show delete and then insert, but write a bunch of javascript to remove the delete part in the Original doc. OR
2. show the insert and then delete so you can just hide delete using CSS visibility: hidden;

I chose no.2 and swapped the insert and delete lines like so —

op_map = {
equal: function (op, before_tokens, after_tokens) {
return before_tokens
.slice(op.start_in_before, +op.end_in_before + 1 || 9e9)
.join("");
},
insert: function (op, before_tokens, after_tokens) {
var val;
val = after_tokens.slice(op.start_in_after, +op.end_in_after + 1 || 9e9);
return wrap("ins", val);
},
delete: function (op, before_tokens, after_tokens) {
var val;
val = before_tokens.slice(op.start_in_before, +op.end_in_before + 1 || 9e9);
return wrap("del", val);
}
};
op_map.replace = function (op, before_tokens, after_tokens) {
return (
op_map.insert(op, before_tokens, after_tokens) +
op_map.delete(op, before_tokens, after_tokens)

);
};

This gives us a breather to write CSS to our needs; for the sake of simplicity. Here’s a gist of the CSS —

.current-document ins {
background: lightgreen;
text-decoration: none;
}
.current-document del {
background: pink;
}
.system-generated del {
visibility: hidden;
}
.system-generated ins {
text-decoration: none;
}
.system-generated ins ~ del {
display: none;
}

I believe the CSS is fairly simple. Both ins and del on “current-document”(Document with changes) are styled to show the “change” and both are hidden on the “system-generated” (Original Document) so that it looks like the “Original” document.

Because of the Swapping of lines we did above, we can now use the sibling selector (~) to remove the del from the DOM on our original document.

Now all that is left is to trigger the JS and output the results.

// Diff HTML strings
let output = htmldiff(originalHTML, newHTML);
let input = htmldiff(newHTML, originalHTML);
// Show HTML diff output as HTML!
document.getElementById("output").innerHTML = output;
document.getElementById("outputNew").innerHTML = input;

That’s the end of this story though. Here’s a pen of it in action —

I hope you enjoyed this piece and it helped you in some way.

I am always open to improvements and changes so feel free to drop a comment down below. Up next I will be publishing an article on uploading large files to S3 using AWS multipart upload API and presigned URLs so stay tuned. Till then, “May the code be with you.”👾

--

--

ReactJS, NextJS, NodeJS and every other JS! It’s like a never ending learning journey. 😎