There are many considerations to take into account when implementing a Microsoft SQL Server in a private cloud environment. Today’s SQL dependent applications have different performance and high availability (HA) requirements. As a solutions architect, my goal is to provide our customers the best performing, highly available designs while managing budgetary concerns, scalability, supportability and total cost of ownership. Like so many tasks in IT infrastructure strategy, success is all about planning. There are many moving pieces and balancing everything to reach our goal becomes a challenge if we don’t ask the right questions up front.
In this two-part series, I’ll share my approach to scoping, sizing and designing private cloud infrastructure capable of migrating or standing up new Microsoft SQL Server environments. In part one, I’ll identify performance considerations and provide real world examples to make sure that your SQL Server environment is ready to meet application, growth and DR requirements of your organization. In part two, I’ll focus on SQL Server deployment options with high availability.
If you need a review on SQL server basics before we dive into the private cloud design, you can brush up here.
Here’s what we’ll cover in this post, with links if you’d prefer to jump ahead:
What follows are the basic performance considerations to take into account when designing a Microsoft SQL Server environment.
RAM—These requirements are based on database size and developer recommendations. Ideally, you’ll have enough RAM to put the entire database into RAM. However, that’s not always possible with large DB sizes. RAM is delicious to SQL and the server will eat it up, so be generous if your budget allows. Leave 20 percent of RAM reserved on the server for OS and other services.
IOPS—The SQL Server is mostly a read/write machine, and its performance is dependent on disk IO and storage latency. With SSDs becoming more affordable, many SQL DBAs now prefer to run on SSDs due to high IOPS provided by SSD and NVMe drives. In the past, many 15K SAS disks in RAID10 were the norm for data volumes.
Low latency is essential to SQL performance. By today’s standards, keeping latency below 5ms per IO is the norm. Sub 1ms latency is very common with local SSD storage. However, 10ms is still a good response time for most medium performance applications.
CPU—Bare metal CPU allocation is easy. Your server has a number of CPUs and all of them can be allocated to SQL with no negative considerations, other than licensing costs. However, allocating CPU to a SQL Server in a VMWare private cloud environment should be done with caution. Licensing is based on CPU cores. Too much assigned, but unused, vCPU GHz in a VMWare VM can negatively impact performance. Rightsizing is key.
The SQL Server is not normally a CPU hog. Unexpected and prolonged high CPU usage during production time means something is not right with the database or SQL code. Adding CPU in those cases may not solve the issue. These unexpected conditions should be checked by SQL DBAs and developers. Higher than normal CPU utilization is to be expected during maintenance time. In instances where a database may require high CPU utilization as a norm, a developer or SQL DBA should check the database for missing indexes or other issues before adding more vCPUs to virtual servers.
Based on the above considerations, when designing a SQL Server environment, budgetary and licensing considerations may start leaning your design toward SQL as a VM on a private cloud with a restricted number of assigned vCPUs. Running SQL on a private cloud is a great way to save on licensing costs. It’s also a good performer for most application loads, and because SQL is more of an IO-dependent system, its performance will greatly depend on the latency and IO availability of your storage system.
SAN storage will normally provide great amounts of IO based on the amount of disks and disk types on the SAN. The norm to expect from many SAN systems is 5ms to 15ms of latency per IO. When your application desires sub 1ms latency, such as gaming, financial or ad-tech systems, a local SSD RAID10 disk array will save the day. These local disk arrays are easy to set up and use with both VMs and bare metal server implementations. You may ask, “What about the HA and extra redundancy features of using a SAN instead of local disk?” I will be discussing High Availability in a performance-demanding environment in Part 2: Deploying Microsoft SQL Servers in a Private Cloud with High Availability. Check back soon.
In the end, your application’s performance requirements and budget will drive your decision whether to run a private cloud or a bare metal SQL deployment. Both VM and bare metal deployments can be configured with high availability and will easily integrate into your private cloud environments.
With these performance considerations in mind, let’s discuss other scoping and sizing matrices that we’ll need to design our high-performing and future-proofed SQL deployment.
As a Solution Engineer at INAP, I collect numerous measurements during the SQL scoping process. Before delving into numbers, I start by evaluating pain points clients may have with their current SQL Servers. I want to know what hurts.
Asking these and other questions helps me understand how to resolve issues and future proof your next SQL deployment:
Once I know the exact problem, or if I’m designing for a new application, I start by collecting specific technical details. The most common specific measurements and requirements collected during the scoping phase are:
In environments requiring high performance database response times, we utilize native Windows Server tools, such as perfmon, to collect very detailed performance matrixes from exiting SQL Servers to identify performance bottlenecks and other considerations that help us resolve these issues in current or future database deployments. Because a good SQL Server’s performance is heavily dependent on the disk sub-systems, we will go deeper in disk design recommendations for private clouds, VMs and bare metal deployments.
Separating database files into different disks is a best practice. It helps performance and helps your DBA easily identify performance issues when troubleshooting. For example, you could have a runaway query beating up your TempDB. If your TempDB is on a separate disk from your prod database files, you can easily identify that the TempDB disk is being thrashed and that same TempDB issue is not stepping on other workloads in your environments.
For high performance requirements, SSDs are recommended. The following is a basic disk layout for performance:
In a SQL deployment that requires DR, the above disk layout will allow you to granularly replicate just the SQL data and the OS that needs to be replicated, leaving TempDB and pagefile out of the replication design since those files are reset during reboot. TempDB and pagefile also produce lots of noisy IO. Replicating those files will result in unnecessarily heavy bandwidth and disk utilization.
Read Part 2: Deploying Microsoft SQL Servers in a Private Cloud with High Availability
Download the full white paper below: