Introduction
This document describes the steps to troubleshoot the HyperFlex Plugin issue that occurs after the new installation/deployment or upgrade of a HyperFlex cluster to version 3.0(1c).
Prerequisites
Requirements
Cisco recommends that you have knowledge of these topics:
- Cisco HyperFlex
- VMWare vCenter
Components Used
The information in this document is based on these software and hardware versions:
- HyperFlex version 3.0(1c)
- UCS C240M5
- VMWare vCenter 6.0 or 6.5
The information in this document was created from the devices in a specific lab environment. All of the devices used in this document started with a cleared (default) configuration. If your network is live, ensure that you understand the potential impact of any command.
Troubleshooting Steps
Step 1. Verify that you do not see the HyperFlex plugin on the vCenter Web Client. Navigate to Home > Global Inventory List and check if the plugin is visible in vCenter. The plugin should be just below Distributed Switches as shown in the image.
Step 2. Ensure that the vCenter login user has the full admin privilege.
Step 3. Check to see if ping from vCenter to HX Cluster Management IP worked (CMIP).
Step 4. Check if the ping to CMIP was intermittent in order to isolate duplicate IP issue.
Step 5. Verify that the plugin was installed via the vCenter Manage Object Browser (MOB). Find out the cluster domain ID before you perform this step. In order to collect the cluster domain ID, Secure Shell (SSH) to the CMIP and run this command D"stcli cluster info | grep -i domain" as shown in the image.
Step 6. In this case, as you can see, the domain ID is c122. Now, navigate to the vCenter MOB and check if the extension for this plugin is present. In order to do so, log in to https://<vCenter IP or FQDN>/mob.
Navigate to content>extensionManager under the properties section and select (more..). At the botton of the list you will see two springpath extensions. One of which includes the domain ID collected before.
Step 7. In order to validate further that the HyperFlex plugin was installed on vCenter Web Client, navigate to Home>Administration>Solutions>Client Plug-Ins.
If you do not see the HyperFlex (Springpath Plugin) listed on the table, click under Check for New Plug-ins. This should populate the Springpath Plugin if it is present. This will take a couple of minutes.
Before you check for new plug-ins:
After you check for new plug-ins:
Step 8. Restart the vSphere Web Client Service (the vsphere-client service).
vCenter Server on Windows
-
Open Server Manager on the Windows system on which vCenter Server runs.
-
Navigate to Configuration > Services.
-
Select VMware vSphere Web Client and click Restart.
vCenter Server Appliance
-
Use SSH to log in to the vCenter Server Appliance as root.
-
Stop the vSphere Web Client service and run one of these commands.
-
Restart the vSphere Web Client service with the help of these commands.
Command outputs from lab vCenter Server Appliance:
Step 9.Delete the extensionList ["com.springpath.sysmgmt" ] mob from vCenter.
Note: Ensure that you delete the unused domain ID only. If you delete the incorrect domain-id, cluster will go offline. Collect the correct domain-id from Step 5. Also, you can delete the cluster from vCenter Hosts & Clusters view, delete the mob entry and recreate the cluster in vCenter and finally re-register the cluster. In case you have a doubt, open a TAC SR before you proceed.
Step 10. Re-reregister the HX cluster to the same vCenter.
Run these commands to re-register the HX to vCenter.
root@ucs-stctlvm-116-1:~# stcli cluster reregister --vcenter-datacenter
DATACENTER --vcenter-cluster <CLUSTER> --vcenter-url <vCenterIP> --vcenter-user <USER>
Step 11. Wait for the vSphere Web Client services to come online before you can login back, it takes about 5 to 10 minutes.
After you log in, you should be able to see the Cisco HX Data Platform under Cisco Hyperflex Systems.
Step 12. If this does not work, check to see if the plugin was downloadable from VCSA SSH console. This test is to isolate any firewall, port issue or certificate issue.
With the use of wget:
sup-ucs-vc:~ # wget https://<CMIP>/plugins/stGui-1.0.zip --no-check-certificate
With the use of Curl:
sup-ucs-vc:~ # curl-v https://<CMIP>/plugins/stGui-1.0.zip
Step 13.Browse to https://vCenterIPaddress/mobthen login asadministrator@vsphere.local.
Navigate to Content>Extension > ExtensionManager >extensionList ["com.springpath.sysmgmt" ] > Server
This is how it should look. The ExtensionServerInfo shows the same URL/IP for a given HX cluster. This should be the same IP as of CMIP.
Step 14. If the server mob output does not show the same URL DNS name, for example, if[0]ExtensionServerInfo and [1]ExtensionServerInfo were two separate URL DNS Names, it could be an issue and cause a problem.
Find the URL which is the Cluster management IP. Verify the DNS in the CtrlVM and follow these steps:
- Disable DNS from HX cluster. SSH to any storage CtrlVM
- Verify DNS server: #stcli services dns show (get the IP of DNS server)
- Stop DNS: #stcli services dns remove --dns <DNS server IP>
- Verify DNS stopped: #stcli services dns show
- Delete the extensionList ["com.springpath.sysmgmt"] mob from vCenter (as covered in Step 9.)
- Re-register the HX Cluster to vCenter (as covered in Step 10.)
- Logout form Web Client and log back. It is verified if the plugin shows up.
- Add the DNS server back to the hx cluster: #stcli services dns add --dns <DNS server IP>
- Verify DNS server is running: #stcli services dns show
Log Analysis
Log Collection
1. vCenter logs - https://kb.vmware.com/s/article/1011641
2. Hyperflex storfs bundle - https://www.cisco.com/c/en/us/support/docs/hyperconverged-infrastructure/hyperflex-hx-data-platform/210831-Visual-guide-to-collect-Tech-Support-fil.html
Example Error Messages
1.Check vCenter virgo logs and look out for messages if vCenter is having communiction issues with HX Cluster stMgr.
2. VCSA Virgo log location: /var/log/vmware/vsphere-client/logs/vsphere_client_virgo.log
3. Check the stMgr logs location: /var/log/springpath/stMgr.log and look out for error messages or failed messages that correspond to Hyperflex cluster or vCenter plugin.
Example logs in a problem situation:
stMgr failed to return a simple cluster name,
[2016-11-15T19:48:40.542Z] [WARN ] pool-9-thread-1 70000096 100001 200001 com.storvisor.sysmgmt.service.ThriftServiceAccess Failed to get cluster name when checking for cluster access. org.apache.thrift.transport.TTransportException: java.net.UnknownHostException: cisco-storage-cluster.com
at org.apache.thrift.transport.THttpClient.flush(THttpClient.java:356)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:65)
at com.storvisor.sysmgmt.StMgr$Client.send_getName(StMgr.java:1308)
at com.storvisor.sysmgmt.StMgr$Client.getName(StMgr.java:1301)
at com.storvisor.sysmgmt.service.ThriftServiceAccess.hasValidAccess(ThriftServiceAccess.java:228)
at com.storvisor.sysmgmt.service.util.StorvisorServerCacheForceUpdaterThread.call(StorvisorServerCacheForceUpdaterThread.java:28)
at com.storvisor.sysmgmt.service.util.StorvisorServerCacheForceUpdaterThread.call(StorvisorServerCacheForceUpdaterThread.java:12)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.net.UnknownHostException: cisco-storage-cluster.com
at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
at java.net.SocksSocketImpl.connect(Unknown Source)
at java.net.Socket.connect(Unknown Source)
at sun.security.ssl.SSLSocketImpl.connect(Unknown Source)
4. Open a Cisco TAC SR if this does not help: https://mycase.cloudapps.cisco.com/case