After recently deploying a new vSphere 6.0 environment, one of my colleagues was unable to log into vCenter.
Access to both the Web Client and C# client was failing. No one else in our organization was having this problem.
This environment was set up with the vCenter Server Appliance 6.0, and has an external Platform Services Controller cluster. We were using an Active Directory identity provider for logins.
After reviewing the SSO logs and doing some searching, I came across this VMware KB:
This KB sorta, kinda matched up with what I was seeing in the logs, and it applies to vCSA 6.0, and has some actual steps to try, so it seemed like a good lead. Essentially, the recommendations there are to search the logs for connection problems to specific domain controllers, and to make sure that both forward and reverse DNS resolution is functioning properly. This is good advice, and I confirmed that things were all good with DNS.
At this point, since there was nothing else to do, I decided to try a packet capture. As it happens,
tcpdump is not installed by default on vCSA. You need to follow the instructions here to enable it (and also
Basically, run this from a privileged shell:
Finally, after capturing packets while reproducing the error, and also during a successful login using another account, I did some analysis.
When a user logs in using an AD account, the Platform Services Controller (via Likewise, if you are on vCSA), performs the Kerberos dance with a domain controller. Kerberos is complicated and there are other resources that explain it much better than I ever could, but the gist is this:
- PSC sends an
AS-REQrequest to the DC over UDP
- DC responds with
- PSC sends a new
AS-REQrequest to the DC, this time with a padata value (preauthorization data) over UDP
- DC responds with
- PSC sends yet another
AS-REQto the DC, this time over TCP
- DC responds with a
AS-REPcontaining the TGT (ticket-granting-ticket), over TCP
- PSC sends a
TGS-REQto the DC to get a service ticket over TCP
- DC responds with the service ticket in a
TGS-REP, which also includes Privilege Account Certificate data; this is a Microsoft-specific extension that basically includes details about the user, including their AD group membership
There are more steps after 8, but the are not relevant here.
The critical thing that I noticed was that, for the user having trouble, there was never a
KRB5KDC_ERR_PREAUTH_REQUIRED challenge returned from the DC, the conversation never switched over to TCP, and the
TGS-REP was missing the PAC data. Without this data, Likewise cannot determine the user’s group membership, and thus cannot decide whether the user is authorized.
So, what controls whether an account requires Kerberos preauthorization? It’s the
DONT_REQUIRE_PREAUTH flag in the
userAccountControl attribute, which you can read about here:
We removed that flag from the user’s account, which for some reason was set for him but no one else. After that, problem solved.