INSERT Queries - Advanced CRUD explanation part 1

INSERT Queries - Advanced CRUD explanation part 1

CRUD queries are one of the cornerstones of every database – creating (INSERT) queries are the first ones within the list. Keep reading and we will teach you everything there is to know about them!


Tools used in the tutorial

DbVisualizer - top rated database management tool and SQL client.


Preface

Who hasn’t heard of CRUD? Every developer – junior or senior – has heard of this term. Juniors may not be well versed in it, but the term translates to four words – Create, Read, Update, Delete. Yeah, four words defining four queries:

  1. INSERT – these are queries that help us insert data inside of our database.

  2. SELECT – these queries help us read (select) data from our database.

  3. UPDATE – these queries, as the name suggests, help us update data within our database.

  4. DELETE – these queries help us delete data from our database.

The concept is simple and in most cases, self-explanatory – queries themselves are not and they need a lot of attention. In these series, we’re going to help you understand how those types of queries work starting from the top and moving towards the bottom basing all examples on relational database management systems, and in particular, MySQL or its flavors.

INSERT Queries – the Basics

The basics of INSERT queries are simple – they let us insert data into our database, right? However, the internal working of the queries is not as simple as it may seem from the first glance. To answer why, we would need to perform some profiling:

  • Connect to a database, then turn on profiling by running a query: SET profiling = 1;

  • Run a query, then figure out its ID by running a query SHOW PROFILES:

Profiling an INSERT Query.

Profiling an INSERT Query

  • Finally, dig into the query execution plan (make sure to specify the ID of the query you‘re profiling – the ID needs to be exact. Mine is 27 because I had to re-run the query, yours might be different):

Profiling a Specific Query.

Profiling a Specific Query

Now you know what an INSERT query does from the inside! Woohoo! Well, not exactly – to know what it does from the inside, you need to dig into the query execution plan shown above. Let’s do that together.

  1. starting – the query starts and there’s nothing much to explain here.

  2. checking permissions – the database checks whether there are sufficient permissions for the query to be executed. If there are not, the query stops here and provides an error.

  3. opening tables – tables need to be open for any queries to have any effect on them.

  4. init – the query is initializing its processes.

  5. system lock – the database is checking whether there are any locks on the database.

  6. update – the query updates data (in our case, inserts data.)

  7. end – the query is closing its processes and coming to an end.

  8. query end – the query itself stops here (end refers to the database closing processes necessary for it to run, but not the query itself.)

  9. closing tables – the database is closing all tables that were affected by this query.

  10. cleaning up – the database is cleaning up and preparing for the next query to be run.

As you can see, there are a lot of things that happen „under the hood“ that need to be considered – each query, of course, works differently, so there are a couple of additional things we need to consider.

For that, let’s come back to the basics – INSERT queries insert data. That means that they inevitably come with a lot of overhead. As an ACID-compliant database storage engine, InnoDB has to support transaction isolation and Multiversion Concurrency Control (MVCC): one transaction cannot obstruct another transaction to run, indexes play a role, and configuration plays it part as well.

That means the following:

  1. The more indexes there are on a table, the slower it will execute. The same with partitions.

  2. If our database is misconfigured (or not configured at all), the query execution speed will be slower than it can be.

Speeding Up INSERTs

To speed up INSERT queries, drop indexes on your table if you have many of them (a couple is nothing to worry about, but do keep in mind that all indexes need to be updated together with the data – that’s the reason they slow down INSERTs), avoid inserting data into partitioned tables (partitions slow down INSERTs at the expense of speeding up SELECTs as well), and if that doesn’t help, ditch INSERTs altogether and use LOAD DATA INFILE instead.

LOAD DATA INFILE is specifically designed to load bigger data sets inside of MySQL-based infrastructure since by reading data from text files it can make use of the buffer pool in InnoDB (one of MySQLs primary storage engines) and is capable of the following things, amongst others:

  1. Skipping a specified amount of intial lines or columns.

  2. Figuring out how fields are terminated or enclosed by.

  3. Only inserting data into specific columns ignoring all errors.

  4. Inserting data into a table after specifying its collation.

LOAD DATA INFILE can be used in the following fashion (of course, specify the proper location towards the file and change the table name to your table name):

1 LOAD DATA INFILE ‘D:/folder/file.txt’ INTO TABLE demo_table 
[FIELDS TERMINATED BY ‘|’];

However, do keep in mind that for the LOAD DATA INFILE query to be as quick as possible, you would need to optimize the innodb_buffer_pool setting within my.cnf: the more RAM is allocated to the buffer pool (InnoDB uses the buffer pool to load data into tables based upon the storage engine), the faster your queries will complete.

Summary

INSERT queries are the cornerstone of CRUD queries in any database management system – they’re no different in MySQL. They insert data into our databases, but everything’s not that simple – in order to optimize their performance, it’s not enough to simply drop indexes or partitions and insert data: that’s why we have to know how they work internally as well.

After we know how queries work internally, it’s crucial to use a proper SQL client to manage our databases and build our queries further – DbVisualizer is the client you want to use if you want to ensure the performance, availability, and security of your databases.

About the author

Lukas Vileikis is an ethical hacker and a frequent conference speaker. He runs one of the biggest & fastest data breach search engines in the world - BreachDirectory.com, frequently speaks at conferences and blogs in multiple places including his blog over at lukasvileikis.com.