http://bit.ly/15fcYLt |
I read an article recently, about how mobile apps will probably not get the hardware boost that people are expecting.
This is partially to do with that CPU performance hitting a sort of (heat) wall and cannot improve on their speed. As Linley Gwenapp said “we’ve been falling behind Moore’s Law ever since Intel hit the power wall back in 2005”.
I myself have noticed that on a few occasions when companies decided to buy an expensive machine for their main server, it turned out to perform slower on queries than their previous soon-to-be-updated machine.
In a recent example, a three year old server with CPUs that have 2.66Ghz clock speed was almost twice as fast as a brand new machines with CPUs that have 2.3Ghz clock speed. I'm not exactly sure, but the new machines probably have several times more cache on the CPUs, probably better instruction set and the hosting company swore that it is several times faster than the old machine. However, our results - specifically to MySQL - have been discouraging.
After reading the article, I would like to suggest a thought exercise:
As DBAs, what would happen if CPUs never improve. As in, their clock speed never improves.
They can probably add more cores, fit in more cache, maybe even double the size of the CPU on the motherboard. However, their basic core performance for single threaded applications would not improve.
What would you do?
How would you solve your current company's needs?
How would you solve your future company's needs in the face of issues such as Big Data?
In my opinion, MySQL will need to break up anything that needs to be single-threaded as much as possible. This would probably not be easy. Adding a Map/Reduce layer to MySQL may help this - it works for other commercial database vendors: Infobright, Greenplum, (I think also) Oracle.
(I am not sure if Oracle may be inclined to improve MySQL's processing of large amounts of data as it may hurt profitable parts of their business.)
Sharding can and has helped companies solve this problem. This breaks up the problem by having the single threads process less data per shard. I am not sure about the available and mature solutions there are if you need to group data across several shards.
Regarding hardware, there is certainly room for "SQL" chips (think Kickfire) and other FPGAs.
Hardware compression could help, especially compression that can spread across cores, but the actual processing of the data after decompression would still be single threaded.
Summary tables could very well help for certain workloads as they pre-process large amounts of data for you into more manageable sizes. In addition to using Hadoop and if you have a person that can model data properly, it can be a very long term solution.
Perhaps pre-processing would be a much bigger thing in the future. As in, you speed your queries now by preparing the answers ahead of time and caching them.
I would like to hear more approaches to solve this problem, but I would prefer the solution to lean on the side of 'tried and tested'.
Although CPU's might not get the same performance boost as they had the last years, both storage (SSD) and RAM will become faster and cheaper. That might not mean if you have a CPU-limited performance problem, but a lot of (My)SQL problems are not in fact CPU-bound, but rather memory-IO or disk-IO bound.
ReplyDeleteSo, I wouldn't sweat it.
That said, parallelizing workload as much as possible is certainly a good idea. But it is also harder to accomplish and more prone to obscure bugs and race conditions.
I'm not confident it can really be done with mysql proper. I'd love to see more drizzle adoption.
ReplyDeletehttps://launchpad.net/drizzle
But in the case where even drizzle can't handle it, I think you'll end up taking the massively parallel box and running multiple mysql/drizzle instances on it with a centralized out of band caching system and locking system to handle multi table updates and concurrency issues.
Both http://flexvie.ws and http://code.google.com/p/shard-query have been in development for years and aim to solve the problems you have described.
ReplyDeleteAlenka, the GPU column store will hopefully provide something open source similar to Kickfire. I'm talking with the author about Shard-Query integration.
What happened to the cloud doing all of the work?
ReplyDeleteThe cloud still has servers that have CPUs and therefore the same problems we are talking about. However, the solution to these problems leverages the cloud very well.
ReplyDelete