Hey, you! Don’t Forget to Enable DB Mail in Agent

Raise your hand if you’ve been there: You set up a new SQL Server instance, configure Database Mail (test it), and then set up a nice Agent job to back up your databases. You configure it to send mail on job completion (so you can keep an eye on it not matter what), but it’s not sending mail. You test DB mail again, and it’s working. What gives?

This is kind of a gimme, but a few weeks ago I configured a new maintenance job (NOT a Maintenance Plan 😉 ) on a new-ish non-production server that didn’t have any other jobs on it yet. When it wasn’t sending mail, I stood around for a lot longer than I’d like to admit before I figured out what was going on.

The kicker is that you have to enable the use of database mail within the SQL Server Agent–this isn’t on by default.

As with most things in SQL Server, there’s a couple ways to turn this on. First is the GUI. The Alert System page of the SQL Agent’s properties dialog is shown here, and you can see right on the top where the main “Enable” checkbox is, along with dropdowns for the DB Mail settings you want to use. Flip that on, pick your desired mail settings (probably only have one setup), restart the Agent service, and your agent jobs will start sending mail as-expected.

Agent Properties dialog showing Mail Settings

There’s also the T-SQL route, which is useful for adding this configuration to a general “initial” script (such as ours: https://github.com/DC-AC/SQL2016_Scripted_Install) so you don’t have to worry about this on new instances that you install/setup. It’s a quick SP call to enable mail:

EXEC msdb.dbo.sp_set_sqlagent_properties @databasemail_profile=N’Main Profile’

Assign the mail profile you want to use, and go. The UI by default and greyed out (at least on a few 2016 instances that I’ve checked recently) checks the “Save copies of the sent messages in the Sent Items Folder” option. This option can be driven with the email_save_in_sent_folder parameter on the proc. Set it to 1 to turn on that option. True story: I have no idea where that mail gets saved on a SQL Server; I assume it goes to the “Sent Items” folder in the mailbox the profile is configured to use, but I’ve never actually configured this with a mailbox that I have access to to see.

This T-SQL step assumes SQL Server on Windows. If you’re doing this on Linux…well, it’s different. I’m not going to reproduce that work here, because it may change since SQL 2017 is still in RC at this point. So, if you’re doing this on Linux, check out the official docs for that process.

Moral of the story here: Don’t be a dumbass like me; turn on DB Mail in the Agent!

Maintenance Plans: They don’t like it when DBAs leave

Discovered some…interesting…information about Maintenance Plans a few weeks ago when we had a DBA leave. I don’t think it warrants being called a “bug”, but it is behavior that I don’t think is ideal, and it definitely can cause issues for us and others. The good news here is that I did get a workaround nailed down (thanks to the help of Random Internet Guy), although we haven’t decided whether or not we’re going to use it.

How this all started

Recently, we had one of our DBAs leave. (Incidentally, this is the first time ever that I’ve had this happen to me and also the first time one has left this company.) After he left at the end of his last day, I pulled all of his access to the Production systems. We knew that any Agent jobs that he happened to own would start failing, and sure enough, there were a few failures that evening and over the course of the weekend.

One of these failures turned into a little more of an involved situation: The Maintenance Plan on the only production SQL 2008 box he set up failed. With us being used to SQL 2000, this was expected to be an easy fix and life would go on. Except, it wasn’t.

Toto, I’ve a feeling this isn’t SQL 2000 any more

First attempt to fix this was to change the owner of the SQL Agent Job to ‘sa’. He’s our standard job owner for consistency and because that makes the job impervious to people leaving. The job still failed, although the error was different this time. Obviously the Agent Job owner had some role in all of this, but it wasn’t the whole story, as things weren’t fixed.

After changing the Job owner, this is the error that occurred:

Executed as user: <SQL Service acct>, [blah blah blah]  Started:  11:25:38 AM  Error: 2010-10-06 11:25:39.76     Code: 0xC002F210     Source: <not sure if this GUID is secure or not> Execute SQL Task     Description: Executing the query “DECLARE @Guid UNIQUEIDENTIFIER      EXECUTE msdb..sp…” failed with the following error: “The EXECUTE permission was denied on the object ‘sp_maintplan_open_logentry’, database ‘msdb’, schema ‘dbo’.“. Possible failure reasons: Problems with the query, “ResultSet” property not set correctly, parameters not set correctly, or connection not established correctly.  End Error  Warning: 2010-10-06 11:25:39.79     Code: 0x80019002     Source: OnPreExecute      Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED.  [blah blah blah] Error: 2010-10-06 11:25:40.58     Code: 0xC0024104     Source: Check Database Integrity Task      Description: The Execute method on the task returned error code 0x80131501 (An exception occurred while executing a Transact-SQL statement or batch.). The Execute method must succeed, and indicate the result using an “out” parameter.  End Error  Error: 2010-10-06 11:25:40.64     Code: 0xC0024104     Source: <another GUID>      Description: The Execute method on the task returned error code 0x80131501 (An exception occurred while executing a Transact-SQL statement or batch.). The Execute method must succeed, and indicate the result using an “out” parameter.  End Error  Warning: 2010-10-06 11:25:40.64     Code: 0x80019002     Source: OnPostExecute      Description: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. [blah blah blah] End Warning  DTExec: The package execution returned DTSER_FAILURE (1).  Started:  11:25:38 AM  Finished: 11:25:40 AM  Elapsed:  2.64 seconds.  The package execution failed.  The step failed.

A Permissions error in MSDB? Huh?

I dug a little, and the short of it is that the Maintenance Plan’s connection string was still set to the departed Admin’s sysadmin account (a SQL login). All of the work that the MP was trying to do, it was still attempting as the other DBA. This, of course, isn’t going to get very far.

Maintenance Plans have connection strings?

This seems like a dumb question to ask, but it’s one of those things that I had glossed over in my head and not actually thought about. The answer: Sure they do! MPs are basically SSIS packages these days, and of course those have connection strings/settings. MPs are no different. Seems like a “DURRR” now, but hindsight, yadda, yadda.

This connection string is controlled by the “Manage Connections” button in Maintenance Plan toolbar. If you never mess with this when you create an MP, it uses an auto-generated connection to the Local Server, using whatever name you used to connect to it (there’s another topic right there, but someday I’ll talk about some quirks with Aliases I’ve run into, and I’ll address that then). This auto-gen’d connection also uses the same auth credentials you are currently using. If you only use Windows Auth, you are stuck with that; if in Mixed Mode, a SQL auth account is an option, and you can set it to another account. Of course, you would need to know that account’s password.

To fix our MP’s connection string, I could change it to my own credentials, but that of course doesn’t actually fix anything, and will leave an even bigger mess for whoever’s left if/when I leave or responsibilities change. I can’t set it to Windows Auth, because when you select that, it validates the auth right then and there. Since we don’t have Windows Auth SA accounts, that won’t work for us: the AD account that I’m logged on with doesn’t have enough rights. Something else needed to be done.

At this point I realized that this is a fair-sized problem and we can’t be the first shop to run into this; I mean, SQL 2005/2008 have been out for a while, and I’m sure lots of DBAs have left shops that heavily utilize MPs. Time to ask the Senior DBA.

Going nowhere fast

For as common as I expected this to be, I wasn’t finding much helpful information. There were some references to this error message, and even someone in the same situation I was in, but most of what I found were dead-end threads.

At one point, I thought for sure that @SQLSarg had it. This blog post contains the revelation that MPs have Owners! His main point of changing the MP owner is to change the owner that the associated Agent Job(s) are set to when you save the MP. Even though I had already changed the job owner to a valid account, I hoped that the MP’s owner had something to do with the credentials that were used in the Plan’s connection string.

Of course, it didn’t. No behavior change.

Aside: Even though it didn’t help with what I was trying to fix at the time, I believe that changing an MP’s owner to a shared or central service account is a good idea. Every time an MP gets modified, the Agent Job’s owner is changed to the same account as the Plan’s Owner. This might not even be you, and when that person leaves the company, you are guaranteed to have execution problems. There’s a Connect item open asking to make it easier to change the owner, so if you like Maintenance Plans and don’t hate yourself, it’s probably a good idea to log in over there and mash the green arrowhead.

The more I dug, the more dead-end threads I read, the more almost-but-not-quite blog posts I ran across, the more I became reserved to one simple fact:

It is impossible to set the Connection String Credentials in a Maintenance Plan to an account that you don’t know the password for.

As that started to sink in, I asked #SQLHelp if that were, in fact, true. I got a couple responses that yes, that was the case (I would be specific, but this was a number of weeks ago, and since Twitter sorta sucks for finding things like that and I can’t remember who it was, we’re out of luck). Although not the news I was looking for, I was glad to know that the deal was pretty much sealed—we were going to need to figure out something else to do.

One last gasp

Since I don’t feel like I’m doing everything I can while troubleshooting a problem if I don’t grab for at least one straw, I decided to investigate something that was described in one of the dead-end threads that I had found while searching.

This thread on eggheadcafe is a conversation between “howardwnospamnospa”* and Aaron Bertrand, who we all know and love. This one “howardw” was in a pretty similar boat to what I was in, and was really close to getting it to work. He was only thwarted because of an apparent quirk when an Instance is in Windows Auth-only mode.

In the last post in the thread, he reports that when he assigns the Local System account (“NT AUTHORITY\SYSTEM”) as owner of the MP, and sets the connection properties of the plan to use Windows Auth, the plan runs. To attempt to duplicate this, I started with an existing MP on a Dev instance that I had set up (therefore my SQL auth account was the MP owner and the user listed in the Connection Properties) and proceeded with the following steps:

  1. Using the statements in SQLSarg’s post, I changed the owner of the MP to Local System
  2. I added my Domain account to the Instance with SA rights
  3. I opened the MP in question with SSMS, switched its connection to use Windows Auth (while logged in with the Domain account used in Step 2), and saved it.
  4. Checked the MP’s auto-gen’d Agent Job to confirm that its owner was NT AUTHORITY\SYSTEM.
  5. Using a connection with my Domain account, I pulled my SQL account’s SA rights. This would previously make the MP sub-plan fail.

I ran the job and, holy cow, it worked! 😀

That’s good, right?

Ehhhh, maybe. Setting the owner to Local System seems pretty dirty. I haven’t thought of a way that this would be a security problem, but it still feels like a stupid hoop to jump through. For us, it has the extra hoop of us having to add our Domain accounts to the instances every time we want to set up or change connection info on an MP (once the Windows Auth is set, it doesn’t re-validate it when you save the plan, so you only have to do it that one time), then take it out when we’re done.

Those bits aside, this workaround (if you will let me get by without calling this a “hack”) does allow setting up Maintenance Plans that will live through the DBAs that set them up leaving the company. By technicality, I achieved my goal.

We’ve talked about this setup very briefly at the office, but haven’t made a final decision on whether or not to go down this road or to abandon MPs for DB maintenance altogether. Hopefully we will get that decision made before someone else leaves.

But at what cost?

The main cost here is a lot of destruction to my affinity for Maintenance Plans.

Using MPs in general, love them or hate them, is pretty much a religious debate. Usually I fall on the proponent side of those arguments. I think they’re easy to set up, they’re pretty flexible, they can dynamically pick up new DBs that get added to the system (not always an advantage), you can cook up some really crazy dependencies within them, and I like pictures & flowchart-y things :-).

After this whole ordeal? Pretty sure I’ve changed my mind. This whole default behavior of how they will stop working when the account that set them up initially is no longer an SA is way too much of a price to pay. What happens when you’re a single DBA shop and you get honked and bail one day? Near as I can tell, if your account gets restricted as it should, backups will stop working, but there won’t be anyone left to notice!

In lieu of MPs, there’s at least one good option out there that has most all of the desired functionality. While this was going on, @TheSQLGuru posted a link to this set of SPs. I haven’t had time yet to really dive into these, but on the surface, they look pretty good.

Final Comment(s)

This entire thing bugs me. I’m almost certain I’m missing something. This should be a huge problem for lots of people. I feel like I’m missing a detail or it’s because we use SQL logins for the most part or some other thing. But… in all of the testing that I’ve done, bad things happen when the initial creating user gets SA pulled. Period. I can’t get past the feeling that I’m doing something wrong, but I haven’t been able to figure it out yet.

In the meantime… I think Maintenance Plans are evil after all…

* Apparently not to be confused with howardwSOLODRAKBANSOLODRAKBANSO[LODRA] (Eve-Online thing; it’s honestly better if you don’t ask)