I was working at a customer to deploy a vRealize Automation 7.2 environment. The customer already had a MS SQL cluster 2014 on Windows 2008. The plan was to also utilize this for the IaaS database. When the initial environment was up and running I started to set up the fabric groups. To my surprise there where no resources available. I then noticed that data collection from the vSphere endpoint did not occur. When I tried to start a new data collection it would not start. Funny stuff was going on and at that point I was unaware the underlying MSDTC problem.
When I opened the infrastructure log I found this error message:
Diving even deeper I started looking at the Manager Server log on the IaaS server and this showed evidence that this issue had to do with MSDTC.
Snippet of C:\Program Files (x86)\VMware\vCAC\Server\Logs\All.log
System.ApplicationException: Error executing query usp_SelectAgent —> System.ApplicationException: Error executing query usp_SelectAgentCapabilities —> System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. —> System.Runtime.InteropServices.COMException: The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems.
To start off with the basics the issue with MSDTC is often related to one of the following:
DNS and Network
Make sure all components are resolvable through DNS and that all components can reach each other. Take special care to test this if a SQL/DTC node had multiple network interfaces.
Make sure that firewalls are set up correctly to allow the traffic for DTC to pass. There are some predefined firewall rules that you need to enable.
Besides these rules also make sure port 135 TCP is open and the high ports 1024 – 65535 can be used as well.
This applies to the local Windows firewalls and/or any firewalls that might be between the components on the network.
If you are using clones of the same template in your virtual environment for creating the IaaS and SQL servers make sure their CID/SID are unique. Usually the Windows SID is fine because there is the option to update this in the customization. But the CID used by MSDTC is not automatically changed.
To check if the CID is unique check the ‘HKEY_CLASSES_ROOT\CID‘ registry key on all involved servers.
As you can see there may be multiple CID’s but looking in the ‘Description’ key will show you the correct one for the MSDTC service.
Now if these CID’s are the same on both servers one of them needs to be reset. To do this you can follow this procedure:
- Open a command prompt with admin privileges
- Run the command ‘msdtc -uninstall’
- Run the command ‘msdtc install’
- Reconfigure MSDTC according to the best practices
VMware best practices
VMware has several KB articles describing the MSDTC setup that is needed for it to work with vRA but if you look closely there might be some differences in how the ‘Client and Administration’ section is configured. In general that is not the key part and things tend to go wrong when the section ‘Authentication’ is misconfigured.
This table describes which authentication type is needed in different scenario’s:
|Non-Clustered DTC||Clustered DTC|
|Non-Clustered SQL||Mutual Authentication Required||NA|
|Clustered SQL||Mutual Authentication Required||Incoming Caller Authentication|
To setup MSDTC go to Start and type ‘Component Services‘ and then browse to ‘Component Services\Computers\My Computer\Distributed Transaction Coordinator\Local DTC‘.Right click and choose ‘Properties‘ and click the ‘Security‘ tab. Set the security settings accordingly.
If you are using a clustered instance of MSDTC make sure you open the properties of the corresponding clustered DTC instance.
VMware supports the use of local and clustered MSDTC instances however there is some additional information that you need to know about to set up clustered instances in the correct way. This is described in more detail in the next section.
One other thing that came up in my search for answers in this matter is a Microsoft security policy called ‘System cryptography: Use FIPS 140 compliant cryptographic algorithms, including encryption, hashing and signing algorithms‘. The ‘Federal Information Processing Standard 140’ is mainly used by the US government (or other institutions requiring this level of security) and is not supported for vRA and its IaaS server.
MSDTC clustering according to Microsoft
Looking at Microsoft for their best practices in this matter also gave me some nice insights. Apparently MSDTC clustering was required when using Windows Failover Clusters with Windows Server 2003.
This was due to the way MSDTC was designed. From Windows 2008 and up Microsoft has redesigned MSDTC so this is not the case anymore. I think this quote sums it up:
‘In Windows 2008 and later you either Create a clustered instance of the MSDTC resource for EVERY SQL Server instance/Group that requires its functionality or DO NOT CLUSTER MSDTC at all.‘
Microsoft has a lot of information on how to set up MSDTC clustering in the correct way. For more information on this topic check out this Microsoft site.
I do like to point out that Microsoft strongly leans toward the use of local MSDTC instances.
In this particular case the solution was to change the way the customer was using the cluster resource for MSDTC. The customer was not using a MSDTC cluster for every SQL server instance. They also didn’t set up mappings to the MSDTC instance. Because of this MSDTC was not working as expected.
The customer chose to simplify the whole setup and revert back to local MSDTC resources which is in line with Microsoft’s best practices.
But as you can see there is more than meets the eye when it comes to setting up MSDTC correctly. Hopefully with this blog post I was able to shed some light on the subject.
During troubleshooting I also used these testing tools: