Skip to main content

Network Troubleshooting

This guide covers common networking issues in vCloud Kubernetes Engine and provides step-by-step solutions to resolve connectivity, DNS, ingress, and performance problems.

Connectivity Issues

Pod-to-Pod Communication Problems

Symptoms

  • Pods cannot reach other pods in the cluster
  • Timeouts when connecting between services
  • Intermittent connectivity issues

![Figure needed]

Screenshot showing pod connectivity test failure

Diagnostic Steps

# Test pod-to-pod connectivity
kubectl exec -it test-pod -- ping TARGET-POD-IP

# Check pod networking
kubectl get pods -o wide

# Verify CNI plugin status
kubectl get pods -n kube-system | grep calico

# Check network policies
kubectl get networkpolicies -A

Common Causes and Solutions

CNI Plugin Issues

Problem: CNI plugin not functioning correctly
Investigation:
- Check CNI plugin pods in kube-system namespace
- Review CNI plugin logs
- Verify CNI configuration

Solution:
- Restart CNI plugin pods
- Verify network configuration
- Contact support for CNI issues

Network Policy Blocking

Problem: Network policies preventing communication
Investigation:
- List all network policies
- Check policy selectors and rules
- Verify intended traffic patterns

Solution:
- Review and adjust network policies
- Create allow rules for required traffic
- Test with policies temporarily disabled

IP Address Conflicts

Problem: IP address conflicts between pods or nodes
Investigation:
- Check pod IP allocations
- Verify CIDR range configurations
- Look for duplicate IP assignments

Solution:
- Restart affected pods
- Verify CIDR range planning
- Contact support for persistent conflicts

External Connectivity Problems

Symptoms

  • Pods cannot reach external services
  • DNS resolution failures for external domains
  • Internet connectivity not working

![Figure needed]

Screenshot showing external connectivity test

Diagnostic Commands

# Test external connectivity
kubectl exec -it test-pod -- curl -I https://google.com

# Check DNS resolution
kubectl exec -it test-pod -- nslookup google.com

# Verify egress rules
kubectl get networkpolicies -A -o yaml | grep -A 10 egress

# Check security groups
# (Review in vCloud interface)

Solutions

DNS Configuration Issues

Problem: DNS not resolving external domains
Investigation:
- Check CoreDNS pods status
- Verify DNS configuration
- Test DNS forwarding

Solution:
- Restart CoreDNS pods
- Verify DNS server configuration
- Check upstream DNS servers

Security Group Restrictions

Problem: Security groups blocking outbound traffic
Investigation:
- Review security group rules
- Check required outbound ports
- Verify rule priorities

Solution:
- Update security group rules
- Allow required outbound traffic
- Test with temporary permissive rules

Network Policy Egress Blocks

Problem: Network policies blocking outbound traffic
Investigation:
- Review egress policies
- Check policy scopes and selectors
- Verify intended egress patterns

Solution:
- Adjust network policies for required egress
- Create specific allow rules
- Test egress requirements

DNS Resolution Issues

Internal DNS Problems

Symptoms

  • Services cannot be reached by name
  • DNS resolution timeouts
  • Inconsistent name resolution

![Figure needed]

Screenshot of DNS resolution failure

Diagnostic Steps

# Test service DNS resolution
kubectl exec -it test-pod -- nslookup kubernetes.default.svc.cluster.local

# Check CoreDNS status
kubectl get pods -n kube-system | grep coredns

# Verify CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml

# Test DNS from different namespaces
kubectl exec -it pod-in-namespace-a -- nslookup service-in-namespace-b.namespace-b.svc.cluster.local

Common DNS Issues

CoreDNS Pod Problems

Problem: CoreDNS pods not running or unhealthy
Investigation:
- Check CoreDNS pod status and logs
- Verify resource allocation
- Check for resource pressure

Solution:
- Restart CoreDNS pods
- Increase resource allocation if needed
- Scale CoreDNS replicas if necessary

DNS Configuration Errors

Problem: Incorrect DNS configuration
Investigation:
- Review CoreDNS ConfigMap
- Check DNS forwarding rules
- Verify cluster domain configuration

Solution:
- Correct DNS configuration
- Update CoreDNS ConfigMap
- Restart CoreDNS after changes

Network Policy DNS Blocks

Problem: Network policies blocking DNS traffic
Investigation:
- Check policies affecting kube-system namespace
- Verify DNS port accessibility (53/UDP, 53/TCP)
- Test DNS from different pod contexts

Solution:
- Allow DNS traffic in network policies
- Ensure kube-system namespace accessibility
- Create explicit DNS allow rules

External DNS Problems

Symptoms

  • Cannot resolve external domain names
  • Slow external DNS resolution
  • Partial external DNS functionality

Diagnostic Approach

# Test external DNS resolution
kubectl exec -it test-pod -- nslookup google.com

# Check DNS forwarding configuration
kubectl exec -it test-pod -- cat /etc/resolv.conf

# Test specific DNS servers
kubectl exec -it test-pod -- nslookup google.com 8.8.8.8

# Check DNS traffic
kubectl exec -it test-pod -- tcpdump -i any port 53

Solutions

Upstream DNS Server Issues

Problem: Upstream DNS servers not accessible
Investigation:
- Test connectivity to configured DNS servers
- Verify DNS server IP addresses
- Check for DNS server timeouts

Solution:
- Configure alternative DNS servers
- Update CoreDNS forwarding configuration
- Verify network connectivity to DNS servers

Ingress and Load Balancer Issues

Ingress Not Accessible

Symptoms

  • External URLs not reachable
  • Ingress controller not getting external IP
  • HTTP/HTTPS requests timing out

![Figure needed]

Screenshot showing ingress accessibility issues

Diagnostic Steps

# Check ingress status
kubectl get ingress -A

# Verify ingress controller pods
kubectl get pods -n ingress-nginx

# Check ingress controller service
kubectl get svc -n ingress-nginx

# Test from within cluster
kubectl exec -it test-pod -- curl -H "Host: yourdomain.com" http://INGRESS-CONTROLLER-IP

Common Ingress Issues

Load Balancer Not Provisioned

Problem: External IP stuck in "Pending" state
Investigation:
- Check load balancer provisioning
- Verify vCloud integration
- Review cloud provider quota

Solution:
- Verify load balancer service configuration
- Check vCloud resource availability
- Contact support for provisioning issues

Ingress Controller Problems

Problem: Ingress controller pods unhealthy
Investigation:
- Check pod logs for errors
- Verify resource allocation
- Review configuration

Solution:
- Restart ingress controller pods
- Increase resource allocation
- Fix configuration errors

DNS and Certificate Issues

Problem: SSL/TLS certificate problems
Investigation:
- Check certificate status
- Verify cert-manager operation
- Review DNS configuration

Solution:
- Regenerate certificates
- Fix DNS configuration
- Update cert-manager configuration

Load Balancer Health Issues

Symptoms

  • Intermittent connectivity through load balancer
  • Load balancer health checks failing
  • Uneven traffic distribution

Diagnostic Approach

  1. Check Backend Health: Review ingress controller backend status
  2. Monitor Statistics: Use HAProxy statistics for load balancer analysis
  3. Test Connectivity: Direct tests to backend services
  4. Review Configuration: Verify load balancer configuration

Solutions

Backend Health Problems

Problem: Ingress controller backends marked unhealthy
Investigation:
- Check ingress controller pod health
- Verify health check endpoints
- Review resource constraints

Solution:
- Fix pod health issues
- Adjust health check parameters
- Scale ingress controller if needed

Performance Issues

Network Performance Problems

Symptoms

  • Slow network connectivity
  • High network latency
  • Bandwidth limitations

![Figure needed]

Screenshot of network performance monitoring

Performance Testing

# Test network bandwidth
kubectl run network-test --image=networkstatic/iperf3 --rm -it -- iperf3 -c TARGET-IP

# Test latency
kubectl exec -it test-pod -- ping -c 10 TARGET-IP

# Monitor network usage
kubectl top nodes
kubectl top pods -A

Optimization Solutions

Resource Allocation

Problem: Insufficient network resources
Investigation:
- Monitor node network utilization
- Check CNI plugin resource usage
- Review network bandwidth allocation

Solution:
- Increase node network capacity
- Optimize CNI plugin configuration
- Distribute network load

Configuration Optimization

Problem: Suboptimal network configuration
Investigation:
- Review CNI plugin settings
- Check ingress controller configuration
- Analyze traffic patterns

Solution:
- Tune CNI plugin parameters
- Optimize ingress controller settings
- Implement traffic optimization

Getting Support

Information to Gather

Before contacting support, collect:

Network Configuration

  • Cluster Network Settings: CNI plugin, CIDR ranges
  • VPC Network Configuration: Network interfaces and settings
  • Security Groups: Applied security group rules
  • Ingress Configuration: Ingress controller and service settings

Error Information

  • Error Messages: Exact error text and codes
  • Log Extracts: Relevant log entries from affected components
  • Timeline: When issues started and any recent changes
  • Impact Scope: Which services or users are affected

Diagnostic Output

# Network diagnostics to include
kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl get svc -A
kubectl get ingress -A
kubectl get networkpolicies -A

# Component status
kubectl get pods -n kube-system
kubectl get pods -n ingress-nginx

# Recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Support Escalation

Level 1: Documentation and Basic Troubleshooting

  1. Review Guides: Check relevant troubleshooting sections
  2. Basic Tests: Perform basic connectivity and DNS tests
  3. Configuration Review: Verify network configuration settings

Level 2: Advanced Diagnostics

  1. Log Analysis: Detailed analysis of component logs
  2. Performance Testing: Network performance and latency testing
  3. Configuration Validation: Deep dive into network configurations

Level 3: Expert Support

  1. Complex Issues: Multi-component networking problems
  2. Performance Optimization: Advanced performance tuning
  3. Integration Issues: vCloud integration problems

Prevention and Best Practices

Proactive Monitoring

  1. Network Health: Regular monitoring of network component health
  2. Performance Metrics: Continuous monitoring of network performance
  3. Capacity Planning: Monitor network capacity and plan for growth
  4. Alert Configuration: Set up alerts for network issues

Configuration Management

  1. Documentation: Maintain network configuration documentation
  2. Change Control: Implement change control for network changes
  3. Testing: Test network changes in non-production environments
  4. Backup: Backup network configurations

Regular Maintenance

  1. Component Updates: Keep network components updated
  2. Performance Review: Regular performance review and optimization
  3. Security Review: Regular security configuration review
  4. Cleanup: Regular cleanup of unused network resources

For additional cluster troubleshooting, see the Cluster Management Troubleshooting section.