Microsoft Kinda, Sorta Broke RDP: That CredSSP RDP Error

If you’re running into this, it probably started on May Patch Tuesday, which was last week (the 8th). You are getting this error when you try to RDP to a/some servers:

An authentication error has occurred.

The function requested is not supported.

This could be due to CredSSP encryption oracle remediation.

Here’s what happened

An authentication error has occurred. The function requested is not supported. … This could be due to CredSSP encryption oracle remediation.

Errors make me sad

In March, a vulnerability in CredSSP (Credential Security Support Provider) was patched, which would affect authentication via RDP (this is outlined in advisory CVE-2018-0886).  However, it was implemented in such a way that the behavior change didn’t have to be “honored” by either the server or the client involved in an RDP session.

 

The intent was that this would be controlled by GPO in enterprise environments, and a new GPO setting to activate or deactivate this behavior was released at the same time.

GPO settings have a default value, which they will use when nothing has been explicitly set for a particular setting. In this case, the GPO has three possible values: Force Updated Clients (for servers to only take connections from patched clients), Mitigated (for both, and on a workstation means that it won’t fall back to old/insecure behavior when attaching to unpatched servers), and Vulnerable (for both, and means what it sounds like–anything goes!).

In March, the default behavior was set to “Vulnerable”, which means everything kept working for everyone. But in the May security rollup, the default setting for that GPO was flipped to “Mitigated” if there was not an explicit setting for it… end result being the core problem some are running into: Clients that have received the May update are no longer able to connect to RPD servers that have not received the March vulnerability fix.

Welp.

(For a bit more background on all of this, see this Microsoft blog post: https://blogs.technet.microsoft.com/askpfeplat/2018/05/07/credssp-rdp-and-raven/)

Good News: Easy Workaround

Fortunately, there’s an easy workaround that can be applied to any Windows workstation facing this behavior, with a couple caveats.

If you are getting the above error trying to RDP to a server, all you have to do is set the corresponding GPO on your local workstation to Vulnerable.

To set this, run “gpedit.msc” on your machine. When the Local Group Policy Editor launches, navigate to Computer Configuration\Administrative Templates\System\Credentials Delegation on the left side, and then find Encryption Oracle Remediation on the right. Open that up, flip it to Enabled, and then choose “Vulnerable” for Protection Level. Hit OK, close GPEdit & you’re done; the change will take effect immediately.

There are a couple caveats: First is, this means you’re choosing to operate in an unpatched situation, which I don’t recommend. The second is that you can only apply this GPO setting on your local workstation if you’re not in an AD environment where it’s been set at the domain level and it’s getting applied to your machine. If that’s the case, the AD-level GPO will stomp on your local setting if it’s different.

Again: This should only be a temporary measure. The real fix is to get the March updates on your servers so you can set your workstation back to at least Mitigated (really should be Force Updated Clients). It’s not going to be my fault if you leave things unpatched and in Vulnerable and then something bad happens!

Commentary

Some have been referring to this as a “bug” and…This isn’t a bug; I mean, the “breaking RDP” part isn’t a bug (the original vuln obviously is). This is 100% “system functions as designed.” There’s a vulnerability in a widely-used feature of Windows, and MS pulled the “better to be on the ground wishing you were in the air, than being in the air wishing you were on the ground” card here. Being a patch hard-liner (I saw too much shit in the early 2000s), I think this is fine. If you don’t like it, there’s a workaround. But, my attitude on this is tempered by the fact that it’s only listed as an “Important” update, and the exploitability seems a little bit out there. Maybe give us all a few more months to notice?

Regardless, I DO think there was a communication failure here, though. Since few people read patch notes on a regular basis (I don’t even, anymore), relying on those to get the message to people isn’t going to work. Even that PFE blog post–which is great–is still a little bit of shouting into the void until someone runs into the problem and goes looking for a solution.

I don’t know what to do about this part, because there’s really just not a mechanism to deal with it. And really, do we need another thing to watch for alerts and stuff? Plus, breaking changes happen on a regular basis… where do you draw the line? And what, should they have made the RDP client throw a pop-up message about this? That seems like an awful big hammer.

I guess I’m going to have to go back to reviewing KB articles for patches again :-/

Azure Infrastructure pre-con ahead of #SQLSatCleveland

Microsoft Azure logoSQL Saturday in Cleveland, Ohio is next week, on February 3rd. If you’re in the area or can easily make it there, I hope that you can come out for a great day of free SQL Server training. I enjoy presenting at SQL Saturdays; they’re fun and educational days for speakers and attendees, alike. Last time we were in Cleveland it had snowed overnight when it was time to leave town on Sunday morning. I’ve lived even longer in the south now, so if that happens again, it’ll be even more fun this time.

In addition to my session on Saturday, where I will talk about using database projects in SSDT/Visual Studio, I’ll also be presenting an all-day session Friday on Azure Infrastructure. Planning and designing your infrastructure is just as important in the cloud as it is when building new systems on-premises. As Azure continues to grow and expand around the world, more companies will be choosing to migrate (or deploy new) services to the public cloud. Understanding the underlying components is imperative to maximum-performance and highly-successful Azure deployments and hybrid migrations. In this session, we’ll cover infrastructure fundamentals with a bit of a focus on deploying and running SQL Server in Azure; however, there will be plenty of general background discussion that can be used for any workload.

Registration for this precon is available here, on EventBrite: https://www.eventbrite.com/e/azure-infrastructure-presented-by-kerry-tyler-tickets-41688096218, with information about the overall SQL Saturday event available here: www.sqlsaturday.com/708

Saturday is free, but tickets for the full-day precon are $150.

I hope to see you next weekend!

Let’s Talk About Group Managed Service Accounts

SQL Server service dialog, Log On tabWe all know the best practices for SQL Server service accounts–domain account (if you’re using Active Directory), non local admin, different one for each service (and server/instance), etc, etc. These are, of course, good best practices and they should be followed as closely as possible in Production and on servers/instances that house Production data.

A problem arises if you have more than just a couple-few servers or run some of the BI components, however. The number of service accounts involved in your SQL Server plant could be very large, necessitating an incredible amount of overhead when it comes to managing those accounts. This goes beyond simply creating and assigning them–chances are good that there are policies in place that require changing passwords. User accounts, service accounts, and other automation accounts likely all fall under this umbrella. If you’re lucky, maybe non-user accounts have a longer change interval, but it’s still something that is going to need to be done on a regular basis. In large environments, this could take an excruciatingly awful amount of time to do.

All of this is not to mention the human factor involved here. One of the recurring themes in a couple of my presentations is making an effort to automate as many things as possible to remove the human from the process. Not that we’re bad, but there are some things, especially tedious and repetitious tasks, where dumb things go wrong simply because of the nature of the work. Changing a bunch of service account passwords is definitely one of them. There used to be two types of sysadmins: those that have changed a service account’s password but forgot to update and restart the service itself, and those who will.

Enter Group Managed Service Accounts

Group Managed Service accounts (gMSAs) are a way to avoid most of the above work. They are special accounts that are created in Active Directory and can then be assigned as service accounts. They are completely managed by Active Directory, including their passwords. This means no more manual work to meet the password-changing policy–the machine takes care of that for you.

There are other security-related controls that can be gained with them, but this is the major feature.

I’ll also note that you–the DBA–are likely to need some help from your AD admin to get these set up. They’re going to need to actually create the accounts for you in the system, and there may be some changes needed to their AD configuration in order to support them. They’ll also need to have a Windows Server 2012 (or R2) domain controller in their domain, but I’d hope today that’s not going to be a hurdle to overcome.

Since I’m mostly here to talk about SQL Server, I’ll note a couple of different support situations. gMSAs are supported from SQL Server 2014 and on running on Windows Server 2012 R2 and on for everything you can do with SQL Server–standalone instances, Failover Clusters, Availability Groups. Just plain Managed Service Accounts (MSAs) can go back a little further, but they only support standalone instances of SQL Server.

From a non-SQL Server perspective, one of the major disadvantages of gMSAs is that one can’t just use them everywhere. Services have to be specifically designed to support the use of these accounts, and that’s not going to be the case everywhere.

Since this isn’t exactly a new feature, there’s plenty of documentation and blog posts out there to read about this feature and the various requirements to implement. There’s a great overview and setup blog post on MSDN here: https://blogs.msdn.microsoft.com/markweberblog/2016/05/25/group-managed-service-accounts-gmsa-and-sql-server-2016/

That post links to this old TechNet article, which still is a pretty good resource for understanding what these things are and a little more detail on what is going on in the back-end: https://technet.microsoft.com/en-us/library/hh831782.aspx

Finally, my coworker Joey has a slightly older writeup here, https://joeydantoni.com/2012/12/14/group-managed-service-accounts/, that walks through the process of setting this up. Note that some of the requirements have changed since that was written, but the general process remains the same.

gMSAs are a nice feature that aren’t too onerous to setup, but go a long way to make your life easier and your data far more secure.

PSA: SQL Server 2016 SP1 CU7 is a Security Update

This week (Thursday the 4th), SQL 2016 SP1 CU7 was released. Right, fine, nothing out of the ordinary.

Except there is something out of the ordinary here. CU7 includes the patches for SQL Server related to the Spectre and Meltdown vulnerabilities (for more info on these as they relate to SQL Server, see Joey’s post here and Allen’s post here), and as a result, it is being flagged and published as a security update from Microsoft as KB 4058561.

So What?

Since this is a security update, that means it will be pushed down by/via Windows Update like other normal security patches. 4058561 says that, “This update will be provided via Microsoft Update at a future date”, but it’s probably safe to assume that this “future date” will be next Tuesday, January 9th, as that is the normal Patch Tuesday this month.

All that said, then, if you are or support a shop that likes to test their SQL CUs before they get deployed, if there aren’t also positive controls on what Windows Update does, it will be a good idea to start testing this update now. Sure, the Venn diagram of “people who test CUs” and “people who let Windows Update go to town on their SQL Servers, whether it be right off the bat Tuesday night or on the next weekend” probably don’t overlap much, but in case you fall into that category: Heads up.

Remember that CUs include all patches since their “baseline” (in this case, SP1), so you’re not going to just be getting the most recent updates–you’re going to be getting an additional six CUs’ worth, too.

Again…

Once again, as everyone else is (or should be) saying, this is a big-deal security vulnerability, and you (all of you, yes, even YOU) really need to apply this patch. For SQL Servers, see guidance from MS here: https://support.microsoft.com/en-us/help/4073225/guidance-for-sql-server

Seriously… patch your stuff.

Don’t Forget About DR for Your DR

Here’s a scenario:

Let’s say you’re a small- or medium-sized company with either an on-premises data center in your office/building or in a “regular” co-lo nearby in the same metro area. You’ve got a mission-critical online presence, so in order to handle either a large-scale disaster for your geographic area or one just in your server room, you’ve written, implemented, and tested a disaster-recovery plan. Another co-lo a couple of states over is set up to be able to step in if needed, and this process can even be completed by non-technical resources in a couple of hours.Need a Plan C

This is a fairly-sound plan. However, what’s Step 2 after Something Bad™ happens to the primary data center and everything fails over to the DR site? What if Something Bad™ is long-term? You’re back to square one, with a single data center. Or where do you put the quorum file share for your AG?

Or, another situation: What if something happens to your DR site? Then what?

Almost Been There, Done That

One of our clients–who has a really good DR plan similar to the one described above–had a brush with this scenario earlier in the year. Their DR data center is in the Houston area, and in the aftermath of Hurricane Harvey, there were some concerns about the status of the DC. The DC itself was fine, but key support personnel would not have been able to get to the site for a number of days if there were such a need.

This situation did a good job of spurning conversations centered around what to do in this situation and what Plan C might look like.

Now What?

The point of this post is mostly to get you thinking about this scenario. Getting DR in place can be enough of a battle itself (I know), but ensuring that what happens next after a potential disaster is considered and planned for is another important step.

What this plan may look like is likely dependent upon what the “first stage” DR plan looks like. Not everyone can afford an additional site, especially if it’s a smaller company. And, let’s be honest: we could sit here all day and what-if burning data centers, but at some point, the return on this investment will become very questionable.

Although this looks/smells like a shameless plug for cloud/Azure, the public cloud is an excellent option to consider here. Even if your company is 100% on-premises with a classic hardware/virtualization platform, keeping a copy of critical systems’ backups up-to-date and available in the cloud is relatively inexpensive.  This “cold DR” process is a very easy-to-implement step to safeguard against a multi-phase or long-term disaster. In the event that these backups are needed, there’s the option of spinning up a group of VMs in the cloud to restore to. At the very least, this cold backup solution will be more-accessible than your current offsite backups if new on-prem servers are stood up somewhere to get the lights back on.