Let’s take five minutes and get our entire enterprise in order.
Get your repo server, your Minion Enterprise download, and your license key (trial or permanent). Install and config take just five minutes.
Extract the MinionEnterprise2.2Setup.exe and run it on your repository server. Give the installation “localhost” for the instance name.
2. Configure email for alerts
Connect to the repo server and insert your alert email:
INSERT INTO dbo.EmailNotification ( EmailAddress, Comment )
SELECT 'Your@Email.com' AS EmailAddress, 'DBA' AS Comment;
3. Configure servers
Insert (or bulk insert) server to the repo:
INSERT INTO dbo.Servers
( 'Prod01', 'Gold', 1433, 1, 1),
( 'Prod02', 'Gold', 1433, 1, 1),
( 'Prod03', 'Silver', 1433, 1, 1),
( 'QA01', 'Silver', 1433, 1, 1),
( 'Dev01', 'Bronze', 1433, 1, 1),
( 'Dev02', 'Bronze', 1433, 1, 1);
4. You’re done!
Jobs will begin kicking off to collect data within the next hour. Some jobs run hourly, some run daily or weekly or monthly. You’ll start getting tables full of useful data, and email alerts that actually mean something.
If you’re impatient to start getting some of the good stuff right away, kick off the CollectorServerInfo% jobs manually. These populate data in the dbo.Servers table, which other jobs need to run. You’ll start noticing data in the Collector.DBProperties, Collector.ServiceProperties, and Collector.DriveSpace tables first, as these jobs run most frequently.
While ME is taking care of your shop for you, you can get to know it better with the online documentation!
Download our FREE maintenance modules below:
- Minion Backup: Free SQL Server Backup utility with world-class enterprise features and full lifecycle management.
- Minion Reindex: Free SQL Server Reindex utility with unmatched configurability.
- Minion CheckDB: Free SQL Server CHECKDB that solves all of your biggest CHECKDB problems.
When I saw the topic for T-SQL Tuesday this time I just had to get in. Maybe I’ve never mentioned it, but backups is one of my big things. Today I’d like to talk about two topics that get overlooked quite often, the “backups” to the backup, so to speak. First up: proper backup alerting. And second, missing backup recovery.
Traditional alerting falls short
Well, let’s begin with a story from my days as senior DBA. Years ago, one of the application groups messed something up in their database, and they needed a restore. “Sure thing,” I said. No problem. So I went to the backup drive, and there wasn’t anything that could even be vaguely considered a fresh backup. The last backup file on the drive was from about three months ago.
OOPS. Oh crap…so what do I tell the app team?
First, a little investigation. I had to find out why the backup alert didn’t kick off. Every box was set up to alert us when a backup job failed. I found the problem right away. The SQL Agent was turned off. And from the looks of things, it had been turned off for quite some time. And as you may realize, there’s just no way to alert on missing backups if the Agent is off and can’t fire the alert.
But that was just the first part of the problem. The SQL Agent couldn’t send the email, of course. But the job never actually failed, because it didn’t start in the first place.
This is the crux of the issue: jobs that don’t start, can’t fail. Alerting on failed backup jobs isn’t the way to go.
“But it’s okay, we have…”
Hold on, I know what you’re thinking. You have service alerts through some other monitoring tool, so that could never happen to you! To degree, you’re right. But let’s see what else can go wrong along those same lines:
- The database in question isn’t included in the backup job.
- The network monitor agent was turned off, or not deployed to that server.
- SMTP on the server has stopped working.
- The backup job has the actual backup step commented out.
- Someone deleted that backup job.
- Someone disabled the backup job, or just disabled the job’s schedule.
Service alerts won’t help you in any of these circumstances.
Proper backup alerting
I’ve run into every one of those scenarios, many times. And there are only two ways to mitigate every one of them (and any other situation you come across) with proper backup alerting.
Number one: Move to a centralized alerting system. You can’t put alerts on each of your servers. When you do that, you’re at the mercy of the conditions on that box, and those conditions can be whimsical at best.
Move the backup alerts from the server level to the enterprise level. Then, when there’s an issue with SMTP or something, you only have one place to check. It’s much easier to keep track of whether an enterprise level alerting system isn’t working than to keep track of dozens, hundreds, or even thousands of servers. After all, if you haven’t heard from a server in a long time, how do you know whether it’s because there’s nothing to hear, or if the alerting mechanism is down?
Number two: Stop alerting on failed backups. Alert on missing backups. When you alert on missing backups, it doesn’t matter if the job didn’t kick off, if the database wasn’t part of the job, or if the job was deleted. The only thing that matters is that it’s been 24 hours since your the backup. Then when you get the alert, you can look into what the problem is.
The important point is that the backup may or may not have failed, but your enterprise alert will fire no matter what. This is a very effective method for alerting on backups, because it’s incredibly resilient to all types of issues…not only in the backups, but also in the alerting process. If you do it right, it’s just about foolproof.
Part 2: Missing Backups
Handling missed backups is not the same as alerting on missing backup (like we talked about above). What we want to do is avoid the need for the alert to begin with.
Minion Backup (which is free, so we get to talk about it all we want, ha!) includes a feature called “Missing Backups”, which allows you to run any backups that failed during the last run.
Here’s what this looks like: You set your backups to run at midnight, and they’re usually done by around 2:00 AM. However, occasionally they fail for one reason or another. Then you get an alert in the middle of the night, and you have to get up to deal with it.
Missing Backups lets you set Minion Backup to run again at, say, 2:30 or 3:00 AM with the @Include = ‘Missing’ parameter. This will look at the last run and see if there were any backups that failed; if there were, then MB will retry them. This will prevent the need for alerts in the first place.
We use this feature in many shops we consult in because we see databases that fail from time to time for weird reasons, but they always pass the second time. So Minion Backup helps improve your backups simply by giving you a second chance at your backups.
Now we mention Minion Enterprise
We’ve got you covered for enterprise-level alerting, too. Our flagship product, Minion Enterprise, was made for just that purpose and it comes with many enterprise-level features; not just backup alerting. I invite you to take a look at it if you like.
But if you don’t then by all means, write yourself an enterprise-level alerting system and stop relying on alerts that only fire on missing backups.
And, improve your situation in general by switching to the free Minion Backup.
Every IT shop has its problems with performance: some localized, and some that span a server, or even multiple servers. Technologists tend to treat these problems as isolated incidents – solving one, then another, and then another. This happens especially when a problem is recurring but intermittent. When a slowdown or error happens every so often, it’s far too easy to lose the big picture.
Some shops suffer from these issues for years without ever getting to the bottom of it all. So, how can you determine what really causes performance problems?
First, a story
A developer in your shop creates an SSIS package to move data from one server to another. He makes the decision to pull the data from production using SELECT * FROM dbo.CustomerOrders. This works just fine in his development environment, and it works fine in QA, and it works fine when he pushes it into production. The package runs on an hourly schedule, and all is well.
What he doesn’t realize is that there’s a VARCHAR(MAX) column in that table that holds 2GB of data in almost every row…in production.
Things run just fine for a couple months. Then without warning, one day things in production start to slow down. It’s subtle at first, but then it gets worse and worse. The team opens a downtime bridge, and a dozen IT guys get on to look at the problem. And they find it! An important query is getting the wrong execution plan from time to time. They naturally conclude that they need to manage statistics, or put in a plan guide, or whatever other avenue they decide to take to solve the problem. All is well again.
A couple of days later, it happens again. And then again and then again. Then it stops. And a couple weeks later they start seeing a lot of blocking. They put together another bridge, and diagnose and fix the issue. Then they start seeing performance issues on another server that’s completely unrelated to that production server. There’s another bridge line, and another run through the process again.
What’s missing here?
The team has been finding and fixing individual problems, but they haven’t gotten to the root of the issue: the SSIS package data pull is very expensive. It ran fine for a while, but once the data grew (or more processes or more users came onto the server), the system was no longer able to keep up with demand. The symptoms manifested differently every time. While they’re busy blaming conditions on the server, or blaming the way the app was written, the real cause of the issues is that original data pull.
Now multiply this situation times several dozen, and you’ll get a true representation of what happens in IT shops all over the world, all the time.
What nobody saw is that the original developer should never have had access to pull that much data from production to begin with. He didn’t need to pull all of the columns in that table, especially the VARCHAR(MAX) column. By giving him access to prod – by not limiting his data access in any way – they allowed this situation to occur.
What Really Causes Performance Problems?
Just as too many cooks spoil the broth, too many people with access to production, will cause instability. Instability is probably the biggest performance killer. But IT shops are now in the habit of letting almost anyone make changes as needed, and then treating the resulting chaos one CPU spike at a time.
This is why performance issues go undiagnosed in so many shops. The people in the trenches need the ability to stand back and see the real root cause of issues past the singular event they’re in the middle of, and it’s not an easy skill to develop. It takes a lot of experience and it takes wisdom, and not everyone has both. So, these issues can be very difficult to ferret out.
Even when someone does have this experience, they’re likely only one person in a company of others who aren’t able to make the leap. Management quite often doesn’t understand enough about IT to see how these issues can build on each other and cause problems, so they’ll often refuse to make the necessary changes to policy.
So really, the problem is environmental, from a people point of view:
- Too many people in production makes for an unstable shop.
- It takes someone with vision to see that this is the problem, as opposed to troubleshooting the symptoms.
- Most of the time, they’re overridden by others who only see the one issue.
What’s the ultimate solution?
In short: seriously limit the access people have in production. It’s absolutely critical to keep your production environments free from extra processes.
Security is one of those areas that must be constantly managed and audited, because it’s quite easy to inadvertently escalate permissions without realizing it. This is where Minion Enterprise comes in: I spent 20 years in different shops, working out the best way to manage these permissions, and even harder, working out how to make sure permissions didn’t get out of control.
Minion Enterprise gives you a complete view of your entire shop to make it effortless to audit management conditions on all your servers.
That’s the difference between performance monitoring and management monitoring. The entire industry thinks of performance as a single event, when in reality, performance is multi-layered. It’s comprised of many events, management-level events where important decisions have been ignored or pushed aside. And these decisions build on each other. One bad decision – giving developers full access to production – can have drastic consequences that nobody will realize for a long time.
Sign up for your trial of Minion Enterprise today.
Set-based enterprise (n.) – An environment in which you can administer multiple servers as if they were one server.
Step by step vs. set-based operations
Quite a lot of both the real world and the world of programming are made up of step-by-step operations. Wash this plate, then wash that glass, then wash that spoon. First present the form to the user, then wait for feedback, then verify their login information, then present the welcome screen. This kind of step-by-step activity feels like the natural order of things.
The big paradigm shift when someone begins working with databases is learning the idea of set-based programming. No, it’s not a good idea to update row 1, then update row 2, then update row 3 and so on; this is aptly called “Row by Agonizing Row” (RBAR). It is far faster to update all the rows in one operation. That, in a nutshell, is set-based programming: handling a set of multiple records in a single operation.
Administering servers one by one
Nearly all of the IT administrative world is also made up of step-by-step operations. First check the patch level of Windows on Svr1. Then, look through the Svr2 logs for errors. Then maybe update backup schedules on Svr3, then on Svr4, then on Svr5. This is part of what makes server administration tedious, and prone to errors.
A small handful of solutions have tried to provide a way to manage multiple servers, but they have fallen far short of the mark. What we are left with is not row by agonizing row, but server by agonizing server (SBAS).
The set-based enterprise
Just as a relational database system provides a way to handle multiple records at once, an enterprise management system provides a way to handle multiple servers at once. Once you add your, say, 30 servers to Minion Enterprise*, you can perform a huge number of operations on any or all of those servers from the central repository:
- Check the patch level of Windows on all servers
- Check the version of SQL Server on all instances
- Set up error log searches to alert you for all production instances.
- Update backup schedules on all instances related to an application.
- Audit SQL security and AD security on all instances.
- Research which databases have the most urgent missing index needs.
And on, and on.
Minion Enterprise creates a set-based enterprise, which becomes a force multiplier for the DBA team. Every action can now apply to a group of servers, or all servers. Audits become a matter of minutes, instead of days or weeks. The odds of catching a hole in security, a critical error, or missed maintenance increase exponentially.
The set-based enterprise transforms overwhelming into overseen, missed to managed. And you’re going to love it.
*How to add a server to ME, so it can be managed: insert a record to the dbo.Servers table! (See online documentation, “Configure initial servers and settings” section.) Note that Minion Enterprise can easily manage hundreds or thousands of instances.
Update: This offer is over for now, but you can always get a free trial of Minion Enterprise!
I want free licenses of Minion Enterprise!
Yes, you do! We’re at the PASS Summit, and that’s as good a reason as any to give away more licenses of Minion Enterprise. We want EVERYONE to enjoy the benefits of set-based enterprise management!
What is Minion Enterprise? Here are a few places to start:
- Sign up for free licenses by November 1, 2016.
- One per customer, and one per company; if you’ve gotten free licenses from us before, you’re not eligible for this offer. Sorry!
- This offer is available for the current version only; you’re eligible for patches and service releases of the current version, but not upgrades.
- It comes with 3 months of support.
- Licenses are not transferable to any other companies.
How about some free licenses?
A DBA’s first day on the job
Newbie DBA: Okay, I’m ready to start. Tell me all about your environment!
Current DBA: Well, we’ve got a couple of test boxes, a few QA, and something like 40 production instances. I think most of the production instances are on SQL 2008 R2, but I’m not sure.
Newb: Ah. Okay, so which prod boxes are the important ones?
Curr: Oh, I guess Server3 and Server8 are some of our bigger ones. I’ll have to dig out my list. I put it together like a year or two ago, and a couple of things have changed since then…
Newb: Oh. Well, what do you guys use for backups around here? Third party, or home grown?
Curr: It’s a mix. Some boxes are still on maintenance plans, I’m pretty sure. Some have those free scripts from that one guy, you know, that free set. And we’ve got a couple of different versions of home grown scripts. I’m pretty sure most of that stuff works okay, but not all of it has alerts.
Newb: Not to criticize, but have you ever thought about changing all of that to one plan?
Curr: Yeah, but we don’t really have the time. We’ve got a lot of fires that spring up around here. Keeping up with disk space requests alone must take up several hours every month.
Newb: Gotcha. Well I know you wanted me to do some general index evaluations on the important prod boxes. When will my credentials be ironed out so I can get started?
Curr: Yeah, Bill’s got that on his plate. It usually takes him a few days to get a new hire’s SQL permissions worked out. Like I said, we have several dozen boxes.
A DBA’s first day on the job, with Minion Enterprise
Newbie DBA: Okay, I’m ready to start. Tell me all about your environment!
Current DBA: Here, I’ve got the list of servers in the dbo.Servers table on the repository instance. Look, we have 4 test boxes, 14 QA, and 47 production. Each one of those is ranked for Gold, Silver, or Bronze level support. I’ll give you that support doc in a minute, but you get it: the gold ones are super important.
Newb: Looks like most of those servers are on SQL Server 2012.
Curr: Yeah, this software checks in on all the instances, so we always know what instances have been patched and which haven’t yet.
Newb: Awesome. What do you guys use for backups around here?
Curr: It’s a mix, but we’re moving to Minion Backup. It’s free, and ties in with this piece – Minion Enterprise. But even for the instances that haven’t moved over yet, we still get alerts from ME if a database hasn’t been backed up recently.
Newb: Okay, that’s pretty cool.
Curr: Oh yeah, you haven’t even seen the start of it. Minion Enterprise does so much that saves us heartache. Like, security audits are easy, we get reports and alerts on disk space, the thing scripts out database objects and jobs. Last month one of the managers decided to drop all of the synonyms on Server4; we didn’t even have to restore a backup. We just pulled all the synonym code from the DBObjectScripts table and ran it.
Newb: I really could have used that at the last job. Junior DBA altered a bunch of views. Okay, so I know you wanted me to do some general index evaluations on the important prod boxes. When will my credentials be ironed out so I can get started?
Curr: Oh it’s done. That’s another thing you can do with ME; I just cloned all of my permissions to your new account, across all the servers. You’re good to go. Oh, and you should really start your index evaluations inside the ME repository; it gathers a ton of information about objects, usage, and indexes for ALL the boxes. There are stored procedures for index research, like the “clustered GUIDs” SP. You’ll see them when you get in there.
This could be you
Download Minion Enterprise and enjoy free for 90 days. Your current team and newbies will thank you.