In today's blog I had a wonderful experience post VCF 3.x to 4.x Migration node addition issue . It was an interesting one which was blogging us to add node on three vcf sites which where all migrated from VCF 3.x to 4.x . Lets dive in to the issue and share how we fixed it .
VCFonVxRail Failing to add the cluster it was stuck on the mutli vds check for ages and never fails below is the screen shot how it looks
Since we are adding a cluster all the cluster issue need to be checked on the /var/log/vmware/vcf/domainmanager/domainmanager.log and operationmanager.log
upon checking the domainmanager.log i could see it was timing out and report below error.
2023-01-27T17:52:57.167+0000 ERROR [vcf_dm,18e4c30160b37a69,c13f] [c.v.e.s.e.h.LocalizableRuntimeExceptionHandler,http-nio-127.0.0.1-7200-exec-6] [EP47QM] PUBLIC_INTERNAL_SERVER_ERROR InternalServerError
com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: InternalServerError
at com.vmware.vcf.clustermanager.controller.v1.ClusterController.getVdses(ClusterController.java:1947)
at com.vmware.vcf.clustermanager.controller.v1.ClusterController$$FastClassBySpringCGLIB$$8e4c657c.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
2023-01-27T17:52:56.473+0000 INFO [vcf_dm,9361b908e0d57041,bafd] [c.v.v.d.rest.DomainManagerAbout,http-nio-127.0.0.1-7200-exec-8] Getting domainmanager service info
2023-01-27T17:52:56.856+0000 INFO [vcf_dm,7af2b6e8f3f68395,bc79] [c.v.v.d.rest.DomainManagerAbout,http-nio-127.0.0.1-7200-exec-2] Getting domainmanager service info
2023-01-27T17:52:57.163+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = MANAGEMENT
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = VMOTION
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = VSAN
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL
2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] nioc {"network":"Management Traffic Type","level":"custom","value":0}
2023-01-27T17:52:57.165+0000 ERROR [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterController,http-nio-127.0.0.1-7200-exec-6] Failed to get VDSes
java.lang.IllegalArgumentException: Invalid trafficType Management Traffic Type
The log reporting Invalid traffic Type management , the sddc is expecting for a type of syntax which is not able to fetch ,so i had check the vCenter and all the Multi vds list each port group name and compare with the sddc db entry . the names are all good and same. use the below curl to get the vds information from sddc .
curl http://localhost/inventory/vds [localhost] | json_pp
we tried to do the API query for VDS of that cluster id , when i run the API query it also fails with exact same error what we see in the logs .
{
"errorCode": "PUBLIC_INTERNAL_SERVER_ERROR",
"arguments": [],
"message": "InternalServerError",
"causes": [
{
"type": "java.lang.IllegalArgumentException",
"message": "Invalid trafficType Management Traffic Type"
}
],
"referenceToken": "E3SROA"
}
{
"type": "java.lang.IllegalArgumentException",
"message": "Invalid trafficType Management Traffic Type"
}
so the sddc is expecting a syntex which we dont have it after the conversion . i tried to manualy compare all the vds and found out what was expected and what is not showing .
i run the below command to get the vds information from db .
psql -U postgres -h localhost -d platform -c "select * from vds;"|cat
The id : 91f30546-5ad2-48a2-8aff-8e98705af7ae vds db has the error , so the niocs entry is what its complaining about . we had open a GSS ticket to get the right information what is expected here and they helped us to fix the db with the right information.
Disclaimer : Don't try to edit the SDDC DB without VMware GSS
Fix :
Take the SDDC Manager Snapshot
We edited the db for the specific id which was incorrect in our case it was the below id mention int he query to view with the select niocs from vds where id='1208afe0-cca6-4d6c-a3bf-d19c13400e26'; with the right information .
updated the query update vds set niocs=" with correct informaiton ";
After which restarted the sddc services .
Cluster validation got completed .
Comments