How to become a storage billionaire?
posted on January 23, 2016 by Amit Golander
Ever thought of how much your work would be easier if you had virtually no constraints?
How much better would the outcome of your work be if you could just pour insane amounts of money on engineers, equipment, licenses etc. and do it under no time pressure?
If you haven’t thought about it – you can stop reading.
If you have, this post will teach you how to simplify things and become a Storage Billionaire. Because when you have more resources than you can ever chew – you don’t spend time and energy on provisioning them.
The technical parts of this photo were taken at the NVM summit earlier this week, where SMART Modular had a booth with a commodity server equipped with their NVDIMM-N cards and the SDM CE software. The latter acronyms stand for the community edition (CE) of Plexistor Software-Defined Memory (SDM).
Which reminds me, SDM CE is available for free download on our website – so there is nothing stopping you from downloading it too.
Now that we’re over with the “how to” part, let’s take a closer look at the performance results. Results measured with the common FIO benchmarking tool, using the exact same server and Linux version.
The left hand side is showing the 2-3 orders! of magnitude improvement in the number of operations per second. Three FIO runs are presented to show that SDM delivers the goods for both single-threaded and multi-threaded applications, and for both normal and tiny write access sizes.
The number of random file system accesses that the FIO application was able to generate on this single socket, midrange XEON processor, are by far the best ever to be presented in the Storage industry. If these millions of file-level accesses are not enough for you, the simplest thing to do is purchase a dual socket server, which most people do anyway.
If you’re still hungry for more, just purchase a stronger processor. Storage performance in this brave new world of SDM scales with CPU power. The E5-2650v3 processor used to generate these results is exactly in the middle of the E5-26xx processor family. Intel will gladly sell you a stronger E5-2699 processor.
“But wait, in the previous blog post you said that it is all about the latency”
What an excellent insight. I did say that… which leads me to the right hand side of the figure. Latency is measured at the application level, including context switches from user to kernel space and copying the data to persistent storage. Again, 2-3 orders of magnitude improvement. Most real life applications will see a single micro second delay for writing data persistently. To put things in perspective, this is the minimal granularity that Linux on x86 can measure.
At these latency levels, using asynchronous accesses (e.g. libaio) are actually less effective than just writing things persistently (OSYNC=1).
If you’re after even lower latency – purchase a processor that is faster than the 2.3GHz one we used. If you’re still unhappy after that (really?), rewrite your code to use the mmap system call instead of reads and writes. This will eliminate context switches latency.
Welcome to a new era
Bye-bye storage bottlenecks. Hello other bottlenecks…