Authentication Failure in vSphere 6.0

2 minute read

After recently deploying a new vSphere 6.0 environment, one of my colleagues was unable to log into vCenter.

vCenter Singed Sign-On login screen with "Authentication failure" error.
The SSO login screen simply returned “Authentication failure.” when attempting to log in.

Access to both the Web Client and C# client was failing. No one else in our organization was having this problem.

This environment was set up with the vCenter Server Appliance 6.0, and has an external Platform Services Controller cluster. We were using an Active Directory identity provider for logins.

After reviewing the SSO logs and doing some searching, I came across this VMware KB:

Unable to add Active Directory users or groups to vCenter Server Appliance or vRealize Automation permissions (2127213)

This KB sorta, kinda matched up with what I was seeing in the logs, and it applies to vCSA 6.0, and has some actual steps to try, so it seemed like a good lead. Essentially, the recommendations there are to search the logs for connection problems to specific domain controllers, and to make sure that both forward and reverse DNS resolution is functioning properly. This is good advice, and I confirmed that things were all good with DNS.

At this point, since there was nothing else to do, I decided to try a packet capture. As it happens, tcpdump is not installed by default on vCSA. You need to follow the instructions here to enable it (and also netcat):

Using tcpdump on vCenter Server Appliance (2084896)

Basically, run this from a privileged shell:

# /etc/vmware/gss-support/install.sh

Finally, after capturing packets while reproducing the error, and also during a successful login using another account, I did some analysis.

When a user logs in using an AD account, the Platform Services Controller (via Likewise, if you are on vCSA), performs the Kerberos dance with a domain controller. Kerberos is complicated and there are other resources that explain it much better than I ever could, but the gist is this:

  1. PSC sends an AS-REQ request to the DC over UDP
  2. DC responds with KRB5KDC_ERR_PREAUTH_REQUIRED over UDP
  3. PSC sends a new AS-REQ request to the DC, this time with a padata value (preauthorization data) over UDP
  4. DC responds with KRB5KDC_ERR-RESPONSE_TOO_BIG over UDP
  5. PSC sends yet another AS-REQ to the DC, this time over TCP
  6. DC responds with a AS-REP containing the TGT (ticket-granting-ticket), over TCP
  7. PSC sends a TGS-REQ to the DC to get a service ticket over TCP
  8. DC responds with the service ticket in a TGS-REP, which also includes Privilege Account Certificate data; this is a Microsoft-specific extension that basically includes details about the user, including their AD group membership

There are more steps after 8, but the are not relevant here.

The critical thing that I noticed was that, for the user having trouble, there was never a KRB5KDC_ERR_PREAUTH_REQUIRED challenge returned from the DC, the conversation never switched over to TCP, and the TGS-REP was missing the PAC data. Without this data, Likewise cannot determine the user’s group membership, and thus cannot decide whether the user is authorized.

So, what controls whether an account requires Kerberos preauthorization? It’s the DONT_REQUIRE_PREAUTH flag in the userAccountControl attribute, which you can read about here:

How to use the UserAccountControl flags to manipulate user account properties

We removed that flag from the user’s account, which for some reason was set for him but no one else. After that, problem solved.