top of page
viquarmca

VCF on VxRail 4.x Failed Add Cluster Injection was stuck on mutli-vds

In today's blog I had a wonderful experience post VCF 3.x to 4.x Migration node addition issue . It was an interesting one which was blogging us to add node on three vcf sites which where all migrated from VCF 3.x to 4.x . Lets dive in to the issue and share how we fixed it .


VCFonVxRail Failing to add the cluster it was stuck on the mutli vds check for ages and never fails below is the screen shot how it looks



Since we are adding a cluster all the cluster issue need to be checked on the /var/log/vmware/vcf/domainmanager/domainmanager.log and operationmanager.log

upon checking the domainmanager.log i could see it was timing out and report below error.


2023-01-27T17:52:57.167+0000 ERROR [vcf_dm,18e4c30160b37a69,c13f] [c.v.e.s.e.h.LocalizableRuntimeExceptionHandler,http-nio-127.0.0.1-7200-exec-6] [EP47QM] PUBLIC_INTERNAL_SERVER_ERROR InternalServerError

com.vmware.evo.sddc.common.services.error.SddcManagerServicesIsException: InternalServerError

at com.vmware.vcf.clustermanager.controller.v1.ClusterController.getVdses(ClusterController.java:1947)

at com.vmware.vcf.clustermanager.controller.v1.ClusterController$$FastClassBySpringCGLIB$$8e4c657c.invoke(<generated>)

at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)

2023-01-27T17:52:56.473+0000 INFO [vcf_dm,9361b908e0d57041,bafd] [c.v.v.d.rest.DomainManagerAbout,http-nio-127.0.0.1-7200-exec-8] Getting domainmanager service info

2023-01-27T17:52:56.856+0000 INFO [vcf_dm,7af2b6e8f3f68395,bc79] [c.v.v.d.rest.DomainManagerAbout,http-nio-127.0.0.1-7200-exec-2] Getting domainmanager service info

2023-01-27T17:52:57.163+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = MANAGEMENT

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = VMOTION

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = VSAN

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EARLY_BINDING

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] transportType = EXTERNAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] Type = EPHEMERAL

2023-01-27T17:52:57.164+0000 DEBUG [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterDisassembler,http-nio-127.0.0.1-7200-exec-6] nioc {"network":"Management Traffic Type","level":"custom","value":0}

2023-01-27T17:52:57.165+0000 ERROR [vcf_dm,18e4c30160b37a69,c13f] [c.v.v.c.c.v1.ClusterController,http-nio-127.0.0.1-7200-exec-6] Failed to get VDSes

java.lang.IllegalArgumentException: Invalid trafficType Management Traffic Type


The log reporting Invalid traffic Type management , the sddc is expecting for a type of syntax which is not able to fetch ,so i had check the vCenter and all the Multi vds list each port group name and compare with the sddc db entry . the names are all good and same. use the below curl to get the vds information from sddc .


curl http://localhost/inventory/vds [localhost] | json_pp


we tried to do the API query for VDS of that cluster id , when i run the API query it also fails with exact same error what we see in the logs .



{

"errorCode": "PUBLIC_INTERNAL_SERVER_ERROR",

"arguments": [],

"message": "InternalServerError",

"causes": [

{

"type": "java.lang.IllegalArgumentException",

"message": "Invalid trafficType Management Traffic Type"

}

],

"referenceToken": "E3SROA"

}

{

"type": "java.lang.IllegalArgumentException",

"message": "Invalid trafficType Management Traffic Type"

}


so the sddc is expecting a syntex which we dont have it after the conversion . i tried to manualy compare all the vds and found out what was expected and what is not showing .


i run the below command to get the vds information from db .


psql -U postgres -h localhost -d platform -c "select * from vds;"|cat



The id : 91f30546-5ad2-48a2-8aff-8e98705af7ae vds db has the error , so the niocs entry is what its complaining about . we had open a GSS ticket to get the right information what is expected here and they helped us to fix the db with the right information.


Disclaimer : Don't try to edit the SDDC DB without VMware GSS


Fix :


  1. Take the SDDC Manager Snapshot

  2. We edited the db for the specific id which was incorrect in our case it was the below id mention int he query to view with the select niocs from vds where id='1208afe0-cca6-4d6c-a3bf-d19c13400e26'; with the right information .

  3. updated the query update vds set niocs=" with correct informaiton ";

  4. After which restarted the sddc services .

  5. Cluster validation got completed .



168 views0 comments

Comments


bottom of page