Database performance improvement - How-to in 4 easy steps
During development, it’s important to think about how your application will work with your databases. You can run into performance issues all the way through the stack, so you’ll want to be mindful of how the data is being accessed. From the client to the database, there are layers where trouble may creep in.
Here are database performance issues you might encounter:
- Object relational mapping frameworks (ORMs)
- SQL code
- Database design
- Indexes
- Database maintenance
- Middle-tier/front-end code (posing as a database performance issue)
In this post, we’ll go through the steps to resolving database performance issues. Basically, the steps are identify, proactively respond, measure, and resolve. But as you’ll see, you can be smarter and more efficient by measuring first.
Let’s take a journey through this process and see how improving it can make all the difference.
Step 1: Identify
The first step to resolving any performance issue is identifying that a problem exists. You can resolve only what you know about. Then you can narrow down the source of the issue. From there, it’s usually a matter of isolating the problem area in order to fix it. For example, for a SQL stored procedure, this may mean decomposing parts of it to isolate a slow query.
Identifying the problem can be fairly complex, especially with asynchronous programming, because performance issues can hide in callbacks. But let’s back up and think about how you first learn about performance issues.
Finding performance issues
You might first learn about a performance issue in a few different ways. From best to worst, here they are:
- During design—that’s right, the best and lowest cost way is early!
- During implementation—we can’t catch everything in design.
- While load testing—assuming you load test.
- In QA—after all, performance is a quality concern!
- Beta testing/UAT—how embarrassing! Well, this is what beta testing is all about anyway, or is it about valuable user feedback?
- Past the point of embarrassing—you probably know what this means. FIRE DRILL!
Your users are the last ones who should be reporting performance issues, especially through one-star reviews! Actually, let me up the ante…the CEO who heard it from the media would be much worse. This is why database performance is important. Consequences include hits to your business’s finances and reputation.
The best way to stay on top of performance issues is to constantly monitor performance. Doing this allows you to see how performance changes over time. When the rate of change gets faster, or when you reach a certain threshold, this might mean taking action!
Step 2: Proactively responding
Monitoring performance is how you be proactive. But before we talk about that, let’s talk about the trouble with simply reacting when there’s a problem.
The Downside of reacting
Why avoid reacting? This is a baseline scenario where you don’t have any sort of application performance monitoring (APM) system in place. When you’re prepared, you can respond rather than react. And responding is much better for everyone.
A reactive environment can lead to serious business problems, stress, and health problems. Here’s what the researcher Vicki LeBlanc reported all the way back in 2009 about stress: Elevated stress levels can impede performance on tasks that require divided attention, working memory, retrieval of information from memory, and decision making.
Sounds a lot like the kind of thinking required for effective, efficient software development!
Responding proactively
Instead of reacting, you can be proactive by being prepared. When you’re prepared, you might respond in the following measured ways:
- Review the performance trends your APM tools have tracked and presented
- Dig into the performance logs to find specific details, comparing any differences over time
- Narrow down the problem. Is it a database performance issue, or is it something else?
- Resolve the problem without panic or misery
Sometimes issues can linger between layers. These issues might be related to network problems, hardware failure, other software, virus scanners, resource contention, or even software updates. Fortunately, most organizations have systems for dealing with these issues so they don’t come up often.
However, code at the boundaries between layers can present as network issues. I’ve seen trouble arise because of cross-global connections (from China to the US to Europe). In that case, it was interplay between the code and the latency.
Once you’ve narrowed down the problem to the source of the apparent database performance issue, you’re ready to start remediation. But we can’t talk about those steps until it’s clear why I say “apparent.”
Apparent database performance issues
Sometimes, bordering on often, a problem presents itself as a database performance issue when it isn’t. Also, some issues involve the database and will affect performance, but those issues don’t originate in the database.
So we can best understand what database performance is by first talking about what it isn’t. Here are problems that can affect performance but that might look like a database issue:
- Locks—locks on rows, pages, extents, and tables can stop queries dead in their tracks.
- Transactions—transactions, especially distributed transactions, that originate in the application layer can kill performance by holding locks.
- Batches—the size of a transaction affects what type and how many locks are held.
- Blocking queries—queries can hold locks too.
- Deadlocks—contention between two operations can cause both to wait on each other. Usually, these will time out eventually, but it will take a while.
- Row by row operations—set-based operations generally have better performance. Often, you can do an RBAR operation as a set-based operation.
- OR(BAR)Ms—often, ORMs perform operations such as loading related entities one by one. It’s easy to avoid this lazy-loading by configuring the ORM properly or even pushing queries and commands into stored procedures. Don’t get me wrong here: ORMs are great for productivity, but they have their drawbacks too. Some ORMs use local caches of the data for performance improvements because having local caches reduces round-trips to the database. Fewer trips to the DB is a good thing!
- API calls—sometimes an API call from the presentation layer to a middle tier can present like a database performance issue.
As you can see, many problems can present like database performance issues even though they are code or design issues.
Let’s face it—it’s all the same to the user. If the user sees performance issues, there’s a problem. And if the user thinks of the app as a database (believe me, it happens!), they might think there’s a database performance issue.
Now that we’ve cleared that up, let’s see what you can do when you actually determine the problem. First, you need to measure before you can fix.
Step 3: Measuring
When, where, and how you measure is important—not only in recognizing a problem but also in resolving the issue. If you’ve been following this post from the beginning, you’ll notice that measuring came after reacting.
Do you think this is the best way? Or would it be better if we had measured up front?
You’ll see from the next section that step 3 and step 4 go hand in hand. So let’s talk about this very important step—measuring—in conjunction with the last step—resolving.
Step 4: Resolving
Let’s see how we might resolve issues in the ideal state.
Troubleshooting in the ideal state
Ideally, you’re measuring everything using something like the Raygun platform so you can get the total performance picture.
Here’s an example:
Let’s say you receive an alert that a custom marketing report in one of your web apps is taking a relatively long time to run. It usually takes 40 seconds, but in the last month, it’s edged up to 65 seconds. It’s a fairly significant change—and if the trend continues, it’ll soon take minutes to run. You need to deal with it, but it isn’t an emergency. According to Steven Covey, important but not urgent is the sweet spot for productivity.
You’ll spend a little time analyzing the problem to see where the main issue lies. This is easy because you’re measuring everything, and it’s all visible in one sweet, sweet dashboard.
It turns out that a query is slowing down as the data set gets larger. How you make the database faster is obvious because you’ve measured from the beginning: you do a database refactor and eliminate the offending string concatenation in the query. You’ve saved the day, and the market reports are running in record time, even as the data continues to grow!
In this scenario, you’ve enabled yourself to effectively do step 2. You’ve responded rather than reacted. Responding is less stressful, more proactive, and just plain better for everyone!
Troubleshooting RBAR
Now we get to a more common problem. Programming to process one entity at a time is much easier than programming for sets. And, frankly, this is fine unless you find that it’s slowing things down. Performance optimization is important, but it can also be costly. You should consider whether the benefit is worth the cost before doing any eager optimizations. To do that, you need to measure.
Look for a lot of the same calls to the database or back end. This situation is a key signature of side-loading calls in an ORM or RBAR code. Unless you have some high-volume service or application (in which caching might help reduce the load), consider this an opportunity for improvement.
I’ve listened to many complaints about ORMs from a DBA friend of mine. Every time, I thought, “This is avoidable. It isn’t the ORM—it’s how they’re using it!” And why not partner with the developer to find out how to do that? Teams are always more effective working together.
Troubleshooting without a net
You can avoid many problems if you validate often. For instance, you can validate the design on paper, in your head, or on a whiteboard. Or you can use TDD, do load testing, use realistic data, and so forth. But often, we can only do so much. Without some kind of safety net, you’re stuck reacting instead of responding. And in this scenario, you’re limited in how you find out about performance issues. Even worse, you’re limited in how you resolve them.
You have to start with a baseline measurement before you can resolve an issue. So, assuming you can pinpoint the issue, you’ve still got to reproduce it. I’ve heard and uttered the words too many times: “I can’t fix the problem if I can’t reproduce it!”
So, what do you do?
- Bring down a copy of production data to a development database
- Run the code against that
- Hope it presents itself in that environment
If you can reproduce the issue, you just might be able to resolve it.
- Measure performance to get a baseline. You need to know your target performance measure
- Do some debugging to locate the exact source of the problem. It’s a terrible feeling to guess wrong, push to production, and still see the same issue
- Analyze queries, indexes, shard keys, and so on
- Fix the problem
Once your fix is in prod, hope the universe is merciful and doesn’t throw you another emergency so you can get back to the already late project you were supposed to be working on all along. We can do better, people!
Improving
Now that you’ve journeyed through the identify, respond, measure, and resolve process, think about how you can improve that process.
The best thing you can do is to move the measure to the front of the process. First, you measure. Then you identify (automatically). Then you respond rather than reacting. Finally, you can resolve the issue before it becomes a real problem.
Relax and iterate.