Network Troubleshooting
This guide covers common networking issues in vCloud Kubernetes Engine and provides step-by-step solutions to resolve connectivity, DNS, ingress, and performance problems.
Connectivity Issues
Pod-to-Pod Communication Problems
Symptoms
- Pods cannot reach other pods in the cluster
- Timeouts when connecting between services
- Intermittent connectivity issues
![Figure needed]
Screenshot showing pod connectivity test failure
Diagnostic Steps
# Test pod-to-pod connectivity
kubectl exec -it test-pod -- ping TARGET-POD-IP
# Check pod networking
kubectl get pods -o wide
# Verify CNI plugin status
kubectl get pods -n kube-system | grep calico
# Check network policies
kubectl get networkpolicies -A
Common Causes and Solutions
CNI Plugin Issues
Problem: CNI plugin not functioning correctly
Investigation:
- Check CNI plugin pods in kube-system namespace
- Review CNI plugin logs
- Verify CNI configuration
Solution:
- Restart CNI plugin pods
- Verify network configuration
- Contact support for CNI issues
Network Policy Blocking
Problem: Network policies preventing communication
Investigation:
- List all network policies
- Check policy selectors and rules
- Verify intended traffic patterns
Solution:
- Review and adjust network policies
- Create allow rules for required traffic
- Test with policies temporarily disabled
IP Address Conflicts
Problem: IP address conflicts between pods or nodes
Investigation:
- Check pod IP allocations
- Verify CIDR range configurations
- Look for duplicate IP assignments
Solution:
- Restart affected pods
- Verify CIDR range planning
- Contact support for persistent conflicts
External Connectivity Problems
Symptoms
- Pods cannot reach external services
- DNS resolution failures for external domains
- Internet connectivity not working
![Figure needed]
Screenshot showing external connectivity test
Diagnostic Commands
# Test external connectivity
kubectl exec -it test-pod -- curl -I https://google.com
# Check DNS resolution
kubectl exec -it test-pod -- nslookup google.com
# Verify egress rules
kubectl get networkpolicies -A -o yaml | grep -A 10 egress
# Check security groups
# (Review in vCloud interface)
Solutions
DNS Configuration Issues
Problem: DNS not resolving external domains
Investigation:
- Check CoreDNS pods status
- Verify DNS configuration
- Test DNS forwarding
Solution:
- Restart CoreDNS pods
- Verify DNS server configuration
- Check upstream DNS servers
Security Group Restrictions
Problem: Security groups blocking outbound traffic
Investigation:
- Review security group rules
- Check required outbound ports
- Verify rule priorities
Solution:
- Update security group rules
- Allow required outbound traffic
- Test with temporary permissive rules
Network Policy Egress Blocks
Problem: Network policies blocking outbound traffic
Investigation:
- Review egress policies
- Check policy scopes and selectors
- Verify intended egress patterns
Solution:
- Adjust network policies for required egress
- Create specific allow rules
- Test egress requirements
DNS Resolution Issues
Internal DNS Problems
Symptoms
- Services cannot be reached by name
- DNS resolution timeouts
- Inconsistent name resolution
![Figure needed]
Screenshot of DNS resolution failure
Diagnostic Steps
# Test service DNS resolution
kubectl exec -it test-pod -- nslookup kubernetes.default.svc.cluster.local
# Check CoreDNS status
kubectl get pods -n kube-system | grep coredns
# Verify CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml
# Test DNS from different namespaces
kubectl exec -it pod-in-namespace-a -- nslookup service-in-namespace-b.namespace-b.svc.cluster.local
Common DNS Issues
CoreDNS Pod Problems
Problem: CoreDNS pods not running or unhealthy
Investigation:
- Check CoreDNS pod status and logs
- Verify resource allocation
- Check for resource pressure
Solution:
- Restart CoreDNS pods
- Increase resource allocation if needed
- Scale CoreDNS replicas if necessary
DNS Configuration Errors
Problem: Incorrect DNS configuration
Investigation:
- Review CoreDNS ConfigMap
- Check DNS forwarding rules
- Verify cluster domain configuration
Solution:
- Correct DNS configuration
- Update CoreDNS ConfigMap
- Restart CoreDNS after changes
Network Policy DNS Blocks
Problem: Network policies blocking DNS traffic
Investigation:
- Check policies affecting kube-system namespace
- Verify DNS port accessibility (53/UDP, 53/TCP)
- Test DNS from different pod contexts
Solution:
- Allow DNS traffic in network policies
- Ensure kube-system namespace accessibility
- Create explicit DNS allow rules
External DNS Problems
Symptoms
- Cannot resolve external domain names
- Slow external DNS resolution
- Partial external DNS functionality
Diagnostic Approach
# Test external DNS resolution
kubectl exec -it test-pod -- nslookup google.com
# Check DNS forwarding configuration
kubectl exec -it test-pod -- cat /etc/resolv.conf
# Test specific DNS servers
kubectl exec -it test-pod -- nslookup google.com 8.8.8.8
# Check DNS traffic
kubectl exec -it test-pod -- tcpdump -i any port 53
Solutions
Upstream DNS Server Issues
Problem: Upstream DNS servers not accessible
Investigation:
- Test connectivity to configured DNS servers
- Verify DNS server IP addresses
- Check for DNS server timeouts
Solution:
- Configure alternative DNS servers
- Update CoreDNS forwarding configuration
- Verify network connectivity to DNS servers
Ingress and Load Balancer Issues
Ingress Not Accessible
Symptoms
- External URLs not reachable
- Ingress controller not getting external IP
- HTTP/HTTPS requests timing out
![Figure needed]
Screenshot showing ingress accessibility issues
Diagnostic Steps
# Check ingress status
kubectl get ingress -A
# Verify ingress controller pods
kubectl get pods -n ingress-nginx
# Check ingress controller service
kubectl get svc -n ingress-nginx
# Test from within cluster
kubectl exec -it test-pod -- curl -H "Host: yourdomain.com" http://INGRESS-CONTROLLER-IP
Common Ingress Issues
Load Balancer Not Provisioned
Problem: External IP stuck in "Pending" state
Investigation:
- Check load balancer provisioning
- Verify vCloud integration
- Review cloud provider quota
Solution:
- Verify load balancer service configuration
- Check vCloud resource availability
- Contact support for provisioning issues
Ingress Controller Problems
Problem: Ingress controller pods unhealthy
Investigation:
- Check pod logs for errors
- Verify resource allocation
- Review configuration
Solution:
- Restart ingress controller pods
- Increase resource allocation
- Fix configuration errors
DNS and Certificate Issues
Problem: SSL/TLS certificate problems
Investigation:
- Check certificate status
- Verify cert-manager operation
- Review DNS configuration
Solution:
- Regenerate certificates
- Fix DNS configuration
- Update cert-manager configuration
Load Balancer Health Issues
Symptoms
- Intermittent connectivity through load balancer
- Load balancer health checks failing
- Uneven traffic distribution
Diagnostic Approach
- Check Backend Health: Review ingress controller backend status
- Monitor Statistics: Use HAProxy statistics for load balancer analysis
- Test Connectivity: Direct tests to backend services
- Review Configuration: Verify load balancer configuration
Solutions
Backend Health Problems
Problem: Ingress controller backends marked unhealthy
Investigation:
- Check ingress controller pod health
- Verify health check endpoints
- Review resource constraints
Solution:
- Fix pod health issues
- Adjust health check parameters
- Scale ingress controller if needed
Performance Issues
Network Performance Problems
Symptoms
- Slow network connectivity
- High network latency
- Bandwidth limitations
![Figure needed]
Screenshot of network performance monitoring
Performance Testing
# Test network bandwidth
kubectl run network-test --image=networkstatic/iperf3 --rm -it -- iperf3 -c TARGET-IP
# Test latency
kubectl exec -it test-pod -- ping -c 10 TARGET-IP
# Monitor network usage
kubectl top nodes
kubectl top pods -A
Optimization Solutions
Resource Allocation
Problem: Insufficient network resources
Investigation:
- Monitor node network utilization
- Check CNI plugin resource usage
- Review network bandwidth allocation
Solution:
- Increase node network capacity
- Optimize CNI plugin configuration
- Distribute network load
Configuration Optimization
Problem: Suboptimal network configuration
Investigation:
- Review CNI plugin settings
- Check ingress controller configuration
- Analyze traffic patterns
Solution:
- Tune CNI plugin parameters
- Optimize ingress controller settings
- Implement traffic optimization
Getting Support
Information to Gather
Before contacting support, collect:
Network Configuration
- Cluster Network Settings: CNI plugin, CIDR ranges
- VPC Network Configuration: Network interfaces and settings
- Security Groups: Applied security group rules
- Ingress Configuration: Ingress controller and service settings
Error Information
- Error Messages: Exact error text and codes
- Log Extracts: Relevant log entries from affected components
- Timeline: When issues started and any recent changes
- Impact Scope: Which services or users are affected
Diagnostic Output
# Network diagnostics to include
kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl get svc -A
kubectl get ingress -A
kubectl get networkpolicies -A
# Component status
kubectl get pods -n kube-system
kubectl get pods -n ingress-nginx
# Recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20
Support Escalation
Level 1: Documentation and Basic Troubleshooting
- Review Guides: Check relevant troubleshooting sections
- Basic Tests: Perform basic connectivity and DNS tests
- Configuration Review: Verify network configuration settings
Level 2: Advanced Diagnostics
- Log Analysis: Detailed analysis of component logs
- Performance Testing: Network performance and latency testing
- Configuration Validation: Deep dive into network configurations
Level 3: Expert Support
- Complex Issues: Multi-component networking problems
- Performance Optimization: Advanced performance tuning
- Integration Issues: vCloud integration problems
Prevention and Best Practices
Proactive Monitoring
- Network Health: Regular monitoring of network component health
- Performance Metrics: Continuous monitoring of network performance
- Capacity Planning: Monitor network capacity and plan for growth
- Alert Configuration: Set up alerts for network issues
Configuration Management
- Documentation: Maintain network configuration documentation
- Change Control: Implement change control for network changes
- Testing: Test network changes in non-production environments
- Backup: Backup network configurations
Regular Maintenance
- Component Updates: Keep network components updated
- Performance Review: Regular performance review and optimization
- Security Review: Regular security configuration review
- Cleanup: Regular cleanup of unused network resources
For additional cluster troubleshooting, see the Cluster Management Troubleshooting section.