Today post about on the latest VCF 4.5 , we where upgrading the VCF 4.4 in MGMT domain the upgrade got completed , after every component upgrade the best practice is to run the precheck when we run the precheck got an NSX-T Audit error , which is generic and basically tells to check your connection between sddc to nsx-t , or password expired. We did check and troubleshoot also took some help to find out what was causing the issue and fixed. Lets dive into the issue and workaround.
When you expand the the NSX-T Audit Status it tell you below remediation
End Time
Jan 11, 2023, 5:16:11 PM
Health Status
Red
Error Description
NSX-T Manager Audit for xxxxxxxxx.vmware.com failed with unknown exception
Impact
High: Do not perform upgrade without addressing this issue unless the available upgrade is for NSX-T.
Remediation
Audit check failed for NSX-T. Check if the SDDC Manager is able to communicate with NSX-T Manager. If not, login to NSX-T and check if upgrade is running and wait for the completion. Also please ensure that credential of type API for NSX-T manager is not expired.
Have gone through all the steps , check no password was expired on all three NSX-T Managers , communication between the NSXT and SDDC was good. We were able to login to the NSX-T GUI without any problems.
we run the below curl on all the three nsx-t nodes to check communication between sddc and nsxt , and it as good.
[ ~ ]# curl -k -u 'admin' -H 'Accept:application/json' -H 'Content-Type:application/json' -X GET https://xxxxxxxxxx.vmware.com/api/v1/configs/inventory
Enter host password for user 'admin':
{
"compute_managers_soft_limit" : 16
}
Now we checked the logs /var/log/vmware/vcf/lcm-debug.log
2023-01-12T19:19:56.436+0000 DEBUG [vcf_lcm,50ec64373cb1d6a1,decb,auditId=e088fc4b-9819-4971-9002-ccd596e3a53c,resourceType=NSX_T_MANAGER,resourceId=xxxxxxxxxx.vmware.com,name=xxxxxxxxxx.vmware.com] [c.v.e.s.l.p.i.
n.NsxtInventoryLoader,Scheduled-9] Overall Upgrade Status SUCCESS
2023-01-12T19:19:56.436+0000 ERROR [vcf_lcm,50ec64373cb1d6a1,decb,auditId=e088fc4b-9819-4971-9002-ccd596e3a53c,resourceType=NSX_T_MANAGER,resourceId=xxxxxxxxxx.vmware.com,name=xxxxxxxxxx.vmware.com] [c.v.e.s.l.p.im
pl.nsxt.NsxtAuditImpl,Scheduled-9] Error auditing NSX-T Cluster xxxxxxxxxx.vmware.com with exception {}
com.vmware.evo.sddc.lcm.model.error.LcmException: Failed to load NSX-T Cluster from the Inventory
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtInventoryLoader.loadNsxtInventory(NsxtInventoryLoader.java:85)
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtAuditImpl.doAudit(NsxtAuditImpl.java:68)
at com.vmware.evo.sddc.lcm.audit.NsxtAuditService.doAudit(NsxtAuditService.java:121)
at com.vmware.evo.sddc.lcm.audit.NsxtInventoryAuditScheduler.auditNsxtInventory(NsxtInventoryAuditScheduler.java:84)
at com.vmware.evo.sddc.lcm.audit.NsxtInventoryAuditScheduler$$FastClassBySpringCGLIB$$2ae06d0f.invoke(<generated>)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.NoSuchElementException: No value present
at java.base/java.util.Optional.get(Optional.java:148)
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtInventoryLoader.markUpgradeAvailability(NsxtInventoryLoader.java:523)
at com.vmware.evo.sddc.lcm.primitive.impl.nsxt.NsxtInventoryLoader.loadNsxtInventory(NsxtInventoryLoader.java:83)
2023-01-12T19:19:56.437+0000 INFO [vcf_lcm,50ec64373cb1d6a1,decb,auditId=e088fc4b-9819-4971-9002-ccd596e3a53c,resourceType=NSX_T_MANAGER,resourceId=xxxxxxxxxx.vmware.com,name=xxxxxxxxxx.vmware.com] [c.v.e.s.lcm.audit.NsxtAuditService,Scheduled-9] NSX-T Audit returned FAILED for resource type NSX_T_MANAGER with name xxxxxxxxxx.vmware.com
After checking more in deep found out repeated error on the about Edge Node does not exist in the VCF inventory .
[vcf_lcm,d3cfaeb310f8bd07,d083,auditId=ae740b5a-36b0-45e7-9758-41c97036e40f,resourceType=NSX_T_MANAGER,resourceId=xxxxxxxxxx.vmware.com,name=xxxxxxxxxx.vmware.com] [c.v.e.s.l.p.i.n.NsxtInventoryLoader,Scheduled-8] Transport Node xxxxxxxx is an Edge Node or does not exist in VCF inventory
We are seeing this across multiple hosts which is causing the failure to load NSXT Cluster Inventory.
From the logs narrow down the list of the nodes FQDN.
Checking Each host with the issues and compared them hostname with the SDDC DB entry .
Example : nsxvmwarelabvcf -> nsxvmwarelabvcf.vmware.com
The logs where reporting the nsx node name in short FQDN verse the full FQDN like showed int he above example.
Workaround :
Logged Into NSX-T UI -> System -> Fabric -> Edit the display names of all the FQDN(from shortname to full name).
Once the renaming is done , perform the rooling reboot of all the Three NSX-T Managers . once at a time for the local inventory sync.
After the rolling reboots , Run the Precheck and it was all Green.
Article is really helpful. Didn't found this solution in VMware KB or other technical blog. Appreciate Viquar for the useful blog.