Skip to main content

NodeGroup Troubleshooting

Common NodeGroup issues and step-by-step solutions.

NodeGroup Creation Issues

Creation Button Disabled

Symptoms: "Create NodeGroup" button grayed out or unresponsive

Causes:

  • Cluster not in active state
  • NodeGroup creation not available post-deployment
  • Insufficient permissions
  • vCloud quota limitations

Solutions:

  1. Verify cluster is "Active"
  2. Contact support for NodeGroup creation assistance
  3. Consider cluster recreation with desired NodeGroups

NodeGroup Creation Failures

Symptoms: NodeGroup stuck in "Creating" status or creation timeouts

Solutions:

  1. Allow 10-15 minutes for creation
  2. Check vCloud resource quotas
  3. Verify network and security settings
  4. Try different instance types or zones
  5. Contact support for persistent failures

NodeGroup Scaling Issues

Scaling Operations Not Working

Symptoms: NodeGroup stuck in "Scaling" status or desired count not reached

Scale-Up Issues:

  • Check vCloud compute quota
  • Verify availability zone capacity
  • Try different instance types
  • Scale in smaller increments

Scale-Down Issues:

  • Manually drain nodes first
  • Check pod disruption budgets
  • Ensure storage can be detached

Unexpected Node Counts

Symptoms: Actual nodes don't match desired count

Solutions:

  1. Allow time for automatic reconciliation
  2. Check individual node health in Kubernetes
  3. Contact support for persistent discrepancies

NodeGroup Deletion Issues

Delete Button Disabled

Protection Analysis:

  1. Last NodeGroup Check: Is this the last remaining NodeGroup?
  2. Master NodeGroup Check: Is this a master/control-plane NodeGroup?
  3. Status Check: Is NodeGroup status "Ready"?
  4. Role Check: Is this a worker NodeGroup?

Solutions by Protection Type:

  • Last NodeGroup: Create additional NodeGroup before deletion
  • Master NodeGroup: Master NodeGroups cannot be deleted (permanent protection)
  • Status: Wait for status to become "Ready"

Deletion Process Failures

Symptoms: Deletion starts but fails to complete

Solutions:

  1. Manually remove dependencies (pods, volumes, load balancers)
  2. Force pod evacuation if needed
  3. Contact support for stuck deletions

NodeGroup Status Issues

Status Not Updating

Symptoms: NodeGroup status appears outdated or inconsistent

Solutions:

  1. Use refresh button or reload page
  2. Clear browser cache and cookies
  3. Allow 30-60 seconds for status propagation
  4. Try different browser or incognito mode

Stuck Status Conditions

Common Stuck States:

  • Creating: Allow 15-20 minutes, contact support if stuck
  • Scaling: Allow 10-15 minutes, check resource availability
  • Error: Check error details, may require NodeGroup recreation

Performance Issues

Poor NodeGroup Performance

Analysis Areas:

  • Resource utilization (CPU/memory usage)
  • Instance types and hardware selection
  • Network latency and throughput
  • Storage performance characteristics

Solutions:

  1. Upgrade to higher performance instance types
  2. Distribute workloads across multiple NodeGroups
  3. Optimize pod resource requests and limits
  4. Use anti-affinity to spread workloads

Getting Support

Information to Gather

NodeGroup Information:

  • NodeGroup name and ID
  • Cluster name and ID
  • Current status and error messages
  • NodeGroup configuration details

Diagnostic Commands (if kubectl access available):

# Check node status
kubectl get nodes -o wide

# Check pod distribution
kubectl get pods -A -o wide

# Check resource usage
kubectl top nodes

# Check recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Support Escalation

  1. Level 1: Self-service (documentation, basic solutions)
  2. Level 2: Support ticket with detailed information
  3. Level 3: Emergency escalation for production-critical issues