As we dive deeper into 2018, we indulge more in the world of data. More than 2.5 quintillions of data are generated every day with over 90% of the world’s data generated only in the last two years. It’s clear that our data-generation is not in anyway a linear one, as it seems to be exponentially accelerating. Let’s take youtube, for example, every minute there are over 300 hours of videos uploaded to youtube, not to mention the hundred thousand views. Now imagine this, what if youtube kept getting slower and slower in it search, after all with that amount of data added per minute, it has a larger set to search from.
This is why the issue of database optimization is crucial for today’s applications. In almost every aspect of the internet, you’d be attempting to access data stored in a database, among potential petabytes of data.
Unless the database is optimized for maximum performance on both, software and hardware level, it’s going to be impossible for a query to return output on a short notice.
Today, we’ll talk about optimizing Mysql, one of the most used database management systems in the world. We’ll focus on the link between the software and hardware level as we can witness a great change in performance by doing so.
What is Database optimization?
More scientifically speaking, database optimization is a set of strategies and techniques that one would follow in order to reduce the database’s response time to a minimum. It’s always of a great importance to increase your efficiency on the resources at hand, before attempting to expand your resources assets. Usually, the mediocre performance of a database is due to a lack of efficiency and bad management. Mysql and the servers are not to blame on this fault, then.
What is needed to optimize Mysql:
In order to optimize Mysql’s performance, we need to understand the four main resources that we have, which are CPU, Memory, Disk, and network. CPUs do a lot of heavy lifting when it comes to querying, not to mention how expensive they are. In other words, if they are not well optimized, it’s a complete waste of resources. There are many strategies to follow for optimization, among which we mention:
- Proper Indexing
- Eliminating correlated subqueries
- Rewriting queries/schemas
Today, we’ll expose one of the best solutions out there, that wouldn’t require as much effort as rewriting or eliminating queries.
Using Intel C++ Compilers:
Compilers are pieces of software used to transform a code from one language to another, usually machine language, so they manage the link between the software we write, and the hardware executed. By using the Intel c++ compiler, it was found that the Mysql optimization varied between 5% and 35%, which is a huge addition and bring about a lot of performance enhancement for Mysql developers. The current technologies offered by Intel C++ for optimization are:
1. Automatic vectorization:
The vectorizer which is one of the compiler’s components detects the operations that could be run in parallel and converts them from sequential to parallel.
2. Interprocedural optimization:
This is one of the key elements of the Intel C++ and a main contributor to the performance enhancement. Interprocedural optimizations are the set of optimizations that go beyond certain program subunits. The multistep process enables the compiler to analyze the code and find potential IPOs with a source file or even across many source files all at once. The optimizations include inlining, eliminating dead-functions and constant propagation.
3. Profile-guided optimization:
The ICC compiler analyzes the code while running and uses the profiled information during any subsequent compilations. This greatly improves the application’s performance as it decreases the code size. It also reduces the branching mispredictions which might be very costly in certain application. Finally, it also reorganizes the code for minimum caching.
4. Guided auto parallelism:
GAP is a very cool feature that offers advice to enhance the performance of a sequential application. It does so by suggesting changes that will leverage the compiler’s power of automatic vectorization and increase the efficiency behind data operations.
Stats in ICC’s favor:
Now, let’s compare the performance of GNU C++, a widely used C++ compiler, and Intel C++. Comparisons are held for the same number of threads and the same buffer_pool size. It was found that Intel C++ held an upper hand, and a drastically noticeable superiority in performance when it came to the queries handled per second. The average improvements are 15% with some reaching up to 30+%.
Queries per second| Image source
When it came to response time, Intel C++ still outperformed GNU with a massive lead on larger numbers of threads.
Time in milliseconds| Image source
In a nutshell, no one knows the Intel architecture better than Intel themselves. This is why Intel C++ might be the best option to run on an Intel CPU architecture. Intel C++ doesn’t produce efficient code for AMD and other architectures as it does for Intel architectures, but the optimizations it make on the Intel architectures are massive.
How to learn The Intel C++:
There’s not so much for a software developer in learning compilers, as they are not a certain language to learn, but a program to use. However, you can start by learning more about compilers in general, because you can’t just jump into the machinery of certain compilers without understanding how compilers fundamentally work. Structure and Interpretation of Computer Programs is a good place to start understanding the working of a compiler. Then, you can pick an easy language and learn more about how it’s compiled, then work your way up to the Intel C++ or whatever have you. Intel’s documentation is generally one of the best documentations out there and you can check their documentation for the Intel C++ to learn more about it. If you want to learn C++, you can find best online C++ tutorials and courses recommended by C++ community on Hackr.io.
The ICC is certainly a valuable asset for Mysql optimization and one way you can go around the tedious task of optimizing your database system. However, it’s not a free tool to use ( although they offer free, discounted, and open source editions for eligible entities). It’s definitely an option worth check out and considering, especially for the 5-35% improvements.