NodeJS Postgresql Bulk upsert.

Krishna Prasad
The Startup
Published in
4 min readAug 23, 2019

--

Insert or Update?

Hello people! Lately I have been working a lot with Postgresql and NodeJS. One of the latest requirements I have faced is a CHALLENGE! Bulk upsert of data read from a .xlsx file. I had to read through tons of questions and docs on Google to even parse what I needed to do.

So enough talk and let’s jump straight into some action. Here’s my setup —

  1. NodeJS version 11.11.0
  2. Postgresql version psql (PostgreSQL) 11.2

On a sidenote, I use Multer to upload the excel files and I will not be walking through that process as it is pretty straight forward. Skipping that, I will directly go through the actual excel file read and data dump. In my case, it is a dynamic mapping of headers so I have a customer mapped “headers” object. Here’s a gist of what I’m doing (You can find the complete code gist on a link below, I will explain only the reading of data part here) —

          let item = {};
// File parse operations here - works on each row basis
item['_id'] = `'${uuid()}'`;
for (let i = 0; i < headers.length; i++) {
if (typeof (row.values[i]) === 'string') {
item[headers[i][headerKeys[i]]] = `'${row.values[i]}'`;
} else {
item[headers[i][headerKeys[i]]] = row.values[i] ? row.values[i] : 'NULL';
}
}
item['createdAt'] = `'${new Date().toUTCString()}'`;
item['updatedAt'] = `'${new Date().toUTCString()}'`;
resp_row.push(`(${Object.values(item).join(', ')})`);

Ok, to start off with, we will initialise an empty object which at the end of day will be pushed into an array since we need the response to be an array of objects. Postgres Sequelize on writing a raw query requires the main pkey and the two created and updated times to be a part of the query always, so you can see that in the function, I am manually adding the “_id” (in my case is a uuidv4) and the two dates at the end. The server requires the dates to be in UTC format, hence the new “Date().toUTCString()”.

Note: If your pkey is an auto increment integer, you may not need to add this, but the created and updated is very much required.

With this, we get an array of objects, but for the query, we don’t need the data to be of the format [{}, {}, {}], we need it like [‘()’, ’()’, ’()’]. You will know why when we create the query.

Now let’s talk the query — What do we need? My requirement is that I have 2 excel files that I need to fit into 1 table in the database — hence the requirement for an upsert to begin with. The two file do not have any conflicting keys except for the unique key — in my case, let’s say “part_number”. So my full query will be broken into two parts, one for each file. Let’s look at the query (note that the second query is the same as this just that the keys will vary )—

CREATE OR REPLACE FUNCTION public.ppeUp(OUT response public."CommonAttributes", OUT sequelize_caught_exception text) RETURNS RECORD AS $func_83947583745$ BEGIN INSERT INTO public."CommonAttributes" as CommonAttribute ("_id", "key_1", "key_2", "key_3", "createdAt", "updatedAt") VALUES ${ppeArr.join(', ')} ON CONFLICT ("key_1") DO UPDATE SET 
"key_2"=excluded.key_2,
"key_3"=excluded.key_3
END $func_83947583745$ LANGUAGE plpgsql;select public.ppeUp();

I have broken the query providing a line space just so I can explain better.

First part of the query is to create a function that will try an INSERT and resort to an UPDATE on conflict. On a sidenote here, there are two things you can do ON CONFLICTDO NOTHING or UPDATE SET.

NOTE: We only UPDATE the keys that this INSERT uses.

The function I am creating here is public.ppeUp(). You can choose to create this in a schema of your choice. So if you have a schema called “internal”, the function name would be internal.ppeUp(). Once the function is initialized, it will move on to the next command — INSERT INTO

Before that, you can check if the function is created by using
your_connected_db_in_psql_cli=# \df public.*

Now that you know your function exists, we’ll move on to INSERT INTO. Now you know your keys but the values come from the excel file. We need the values in the format (“val_1”, “val_2”….). Remember we did the Object.values(obj) in the excel reader? This is why. So a simple Array.prototype.join(‘, ‘) will give us the values as we exactly need it for the query. So if there is a conflict with “key_1”, then the ON CONFLICT is triggered which will result in updating the row (NOTE that we do not explicitly mention the table name in the UPDATE block.) Then finally, the END which will end the function creation.

Once all this is done, we will need to trigger the function we just created by using select public.ppeUp();

Now we will pass on the data from the second excel file with keys (“key_4”, “key_5”, “key_6”). And I rest my case here!

I really hope this was helpful for you folks here. Please let me know if there can be any improvements here and I will edit and make the necessary updates. Here’s the gist link as promised :D Github gist

Enjoy and have a splendid time! Until next time B-)

--

--

Krishna Prasad
The Startup

ReactJS, NextJS, NodeJS and every other JS! It’s like a never ending learning journey. 😎