MultiValue Database and Framework Benchmarking

Benchmarking has been a topic that's been floated around for several years, but the MultiValue Community doesn't have much in the way of prepackaged tools. I've heard many different reasons for the lack of standard benchmarking tools, so I'll go through a few of them.

What are TPC.org Benchmarks?

There are many benchmarks out there for a wide variety of other environments, but I'm going to focus on the most popular. For many years, the TPC.org benchmarks have been what most people think of when talking about Benchmarking. You've probably heard of the TPC-C, TPC-H, and TPC-E versions. These have become the industry standards as benchmarks that people want to use. But, like any benchmark tool, they all have drawbacks as well.

The TPC-C benchmark is the most common, mainly due to how old it is. TPC-C has been around since 1992. It is designed to model an OnLine Transaction Process (OLTP) wholesale application. Since it was designed over twenty years ago, it made some assumptions about the hardware that don't hold up today. Mostly, it expected that machines did not a have a lot of RAM and it expected inefficient Disk IO. The TPC-E benchmark was created as a more modern replacement for the aging TPC-C benchmark. TPC-E uses an OLTP stock exchange as its data model. The design is more sophisticated in its simulation of a real-world system. It takes into account the fact that modern servers use lots of RAM and other system resources to offset the slower Disk IO.

The TPC-H benchmark is designed to model ad hoc reporting. Its main purpose is to measure query/select results using a Data Warehouse as a database model.

Keep in mind, TPC.org has many more benchmarks that provide information for different purposes, but these three are the most common. While the TPC council offers all the details needed to run these benchmarks, you are not allowed to release your benchmark test to the general public without getting an official sign-off from them.

Is Benchmarking Dead?

Benchmarking used to be the central marketing piece of comparing databases, but has fallen to the wayside in recent years. Many C-Level managers still look at that component without realizing it, but there's a reason benchmarking isn't as critical nowadays: Hardware.

When it used to cost the IT department their first born in order to throw hardware at a problem, speed was more about the software. Now, RAM costs you pennies, and readily-available Solid State Drive (SSD) elevate Disk IO to the speed of RAM. On top of all of that, you can now create clustered systems which distribute the database processing across more than one machine.

Virtualization also adds a wrinkle into the mix because a Virtual Machine (VM) can look like a single machine, but have its physical resources allocated much differently to improve performance. And you can often tweak the VM's hardware by changing a few software settings, instead of physically replacing hardware components. Cloud-based virtualization often doesn't even require the machine to be rebooted for these hardware changes to be made.

Of course, the operating systems available to the databases have changed as well. A benchmark running on a Windows platform will generate different results than one running on Linux, even on the same hardware configuration running the same database version. You can even get different results running the same operating system and the same hardware, but tuning the operating system's kernel differently.

Now add the next variable: network infrastructure. As we know, in the real world, our network infrastructure affects how fast data can be sent to our servers. We no longer have dedicated connections to the database like in the old RS-232 days. Network issues, such as client connections, switch overhead, and network traffic, all will affect how your benchmark produces its information.

After addressing the hardware configuration, infrastructure, and the operating system variables, you still have to look at the database version on which you are running the benchmark. Each release has gone through improvements in order to take advantage of modern server architecture and resources. Hardware keeps getting faster, and well-made software wants to leverage those advances.

There has also been improvement in database system internals, like indexing, caching, and other efficiencies, that the benchmarks don't know about as well. For example, the database may use an inverse index in place of a b-tree, or even a bitmap index, based on the type of data being stored. Or it may choose to load the whole file table into memory and keep it there instead of relying on Disk IO. These change the whole CRUD (Create, Read, Update, Delete) dynamics.

Now that you have considered all of the above, there is one last variable to consider: administrator-introduced performance features. When he or she places an additional index on a secondary key — or uses a stored-procedure (BASIC routine) which utilizes a database specific feature — the results can become skewed even more.

All these things have changed the simple "Benchmark Marketing" concept of "I'm faster than you are" that used to go on in the early days of computing.

What Does This Mean?

Are benchmarks dead? Are they no longer useful? No, benchmarking still has its place, but it ceases to be the apples-to-apples comparison it used to be. Marketing used benchmarks when it was reasonable to assume that the machine with the faster benchmark would be faster in production. Now it is hard to assure that a specific instance of a system will produce the same benchmark values twice, much less assume that a generic benchmark is an indicator of how your system will react to the daily workload.

Benchmark comparisons have moved from the comparison of different databases to the performance of the hardware that database is installed on. If benchmarks are used provide performance numbers for hardware configurations instead of database comparisons, you will find the value in them.

You should also be very skeptical about published benchmarks you may find. There is a lot of important information that is missing from them. Let's consider two benchmarks from HP, published on TPC.org, for the TPC-C Benchmark.

Both start with identical specifications [Figure 1] .

Company: HP
System:HP ProLiant DL380 G7
OS:Windows Server 2008 R2 Enterprise
Database:Microsoft SQL 2005 Enterprise SP3

But the reported benchmark results are not the same [Figure. 2]

SubmittedPerformance
05/04/111,024,380tpmC
05/11/10803,068tpmC

Looking at this you might be scratching your head. The specs seem to say they are the same system, right? You need to dig into the details to find the differences [Figure 3] .

Performance:1,024,380tpmC803,068tpmC
Number of Clients:839,690639,960
Memory:384GB16GB
CPU:Xeon 3.4GhXeon 3.3Gh
2 Processor/6 cores2 Processor/ 8 Cores
12 threads8 threads

Looking at this information, they are pretty close, other than the amount of memory and the total number of clients used. A natural assumption is "oh, they just threw more memory at the problem," but let's look at something else they did that really makes a difference [Figure 4] .

Controller:8gb Fiber8GB Dual Channel
Disk (Data Storage):81@ SSDs128@ SSDs
Disk (Logs Only):12@ SAS 7.2K54@ SAS 7.2k
Disk (OS):2@ SAS 15K2@ SAS 15k

First thing to point out is that they spread the OS, log files, and database storage onto three different disk volumes. Not really a bad thing and should be done as general practice for most database servers. But this will affect your comparisons if you currently don't have your systems setup this way.

The next thing to point out is that the database storage is on SSDs, and not on spinning disks. While they are both on SSDs, which is nice, you will notice that the quantity of SSDs are different. That means the database is being split across more disks differently, and the information sheet provided does not state if there is a RAID setup, how the RAID is setup, or how the data is allocated across the disks. Regardless, more disks mean more seeks per second when the database looks for data to satisfy a query.

As you can see, with only a slight change — from a cost standpoint — in hardware, you can get very different performance numbers. The other thing to note in the performance numbers that is very telling is the "Number of Clients." This variable matters, especially with TPC-C benchmark.

As you probably have guessed, the number of clients tells you how many connections you have to the server. Also, the 'tpmC' value is an accumulative average based on all the clients working together. That's right, the more clients working on a system, the better the performance number looks.

If you do some simple math, taking the performance number and dividing it by the number of clients, you will notice that the performance per client is really about the same [Figure 5] .

Performance: 1,024,380tpmC 803,068tpmC
Number of Clients: 839,690 639,960
Per Client: 1.21 1.25

The published benchmarks look good, but they are misleading. Not only do hardware changes affect performance numbers, you have to consider how the benchmark was run, such as number of clients and how long the benchmark was run for, to get a good idea of true numbers.

This also means it is really hard to compare numbers between vendors and databases because each one is run differently. Many vendors and/or database providers have their own implementation of the benchmark software as well, so there can be major variations between performance number based on how, when, and how long a benchmark was run.

Using Benchmarks in MultiValue

Another major drawback to benchmarks is the fact that they are based around using First Normal Form (FNF) and SQL syntax. This isn't really a big deal with MultiValue platforms since we can do both, as well as non-SQL, but it also means you can't use standard benchmarking tools either, because they can't test all of the ways we access data.

So where does that leave us? Doing it ourselves.

As a proof of concept to see what it would take, I decided to create a TPC-C-like benchmark. I chose this benchmark over TPC-E and TPC-H because it was the easiest. There are more examples floating around for how to implement TPC-C than there are for TPC-E.

In addition, TPC-E requires the use of C code provided by TPC.org, so I didn't want to try to implement or translate that code into MultiValue BASIC or some other client. Several MultiValue versions play well with C and I invite you to implement TPC-E if you are so inclined. If you do, please send me a copy so it can be shared. Another thing to consider about TPC-E is that the only vendor that has done any work with it is Microsoft. While it was designed by Microsoft, SAP, and Oracle, only Microsoft has jumped on the bandwagon to officially produce benchmarks for it.

TPC-H requires the use of SQL queries. While I could translate those SQL queries into MultiValue Retrieval statements, or use an SQL parser, it required a lot more setup and database configuration than I had time for since this is just a proof of concept.

With all that in mind, I settled on the TPC-C benchmark. This has the added benefit of mimicking the model of many of our day-to-day systems anyway. This benchmark also fits the Port/Pib/Telnet connection style that many of our MultiValue applications currently use.

The last benefit, since TPC-C is also a Disk IO intense benchmark, is that it tests a key advantage of the MultiValue Database Framework.

The TPC-C programs I wrote are in pure MultiValue BASIC, and are designed to adhere as closely to the TPC-C spec as possible. Since each client runs on its own port, just like any other user would, my implementation follows the TPC-C spec much closer than most TPC-C benchmark programs do. In order to decrease the cost of benchmarking, it not uncommon for vendors to use a multi-threaded client on a single PC, instead of one PC per connection, in order to mimic the one-client connection that the TPC-C benchmarks states.

Since my programs are designed be as classic to the benchmark as possible, it does not take into account any performance improvements which are commonly used in our day to day systems. For example: I did not implement b-tree indices or version-specific SELECT statements. This version of the benchmark does not take into account any data structure optimization that could be done, either. All the test data is still stored in FNF, even though a huge speed improvement would be possible by changing it to use MultiValues and nested tables.

Again, the idea wasn't to optimize the benchmark to prove that MultiValue data structures are better - they certainly are for this sort of test - but to try to approximate how a MultiValue database would handle data when limited to the same format as an SQL style database.

Here is my test system and what I did [Figure 6] .

System:Dell PowerEdge T610
OS:VMware/RedHat EL3
Database:D3/Linux
Disk:SSD
RAM:80
Duration:2 hours
Performance:1179tpm
Number of Clients:10
Per Client:117.9

Now, I would love to scale this up to sixty thousand or ninety thousand clients, but I don't have a server or the licenses to run that kind of test. If we were to scale this up statistically, even with say a forty percent decrease in performance due to increased disk IO and memory, that would still put the performance at 6.6 million tpm. And that's the plain-Jane, done-in-a-rush, not-very-optimized version.

As I've stated above, there are many different variables that would affect these numbers, so it really isn't fair to compare performance numbers from one database type to another; or to compare a purely statistical number to an actual number.

The point is that MultiValue systems are very speed-competitive with other systems; not slow or behind-the-curve. This exercise was focused on recreating a standard benchmark. It was very informative seeing the kind of results I could generate. There are still many different variations on this benchmark. I would be very interested in people's feedback. And, if you happen to have a big system to run this version on, or want it as a jumping off point for your own, more tuned-up version, let me know.

NATHAN RECTOR

View more articles

Featured:

Jul/Aug 2017