Network Troubleshooting

This guide covers common networking issues in vCloud Kubernetes Engine and provides step-by-step solutions to resolve connectivity, DNS, ingress, and performance problems.

Connectivity Issues

Pod-to-Pod Communication Problems

Symptoms

Pods cannot reach other pods in the cluster
Timeouts when connecting between services
Intermittent connectivity issues

![Figure needed]

Screenshot showing pod connectivity test failure

Diagnostic Steps

# Test pod-to-pod connectivity
kubectl exec -it test-pod -- ping TARGET-POD-IP

# Check pod networking
kubectl get pods -o wide

# Verify CNI plugin status
kubectl get pods -n kube-system | grep calico

# Check network policies
kubectl get networkpolicies -A

Common Causes and Solutions

CNI Plugin Issues

Problem: CNI plugin not functioning correctly
Investigation:
- Check CNI plugin pods in kube-system namespace
- Review CNI plugin logs
- Verify CNI configuration

Solution:
- Restart CNI plugin pods
- Verify network configuration
- Contact support for CNI issues

Network Policy Blocking

Problem: Network policies preventing communication
Investigation:
- List all network policies
- Check policy selectors and rules
- Verify intended traffic patterns

Solution:
- Review and adjust network policies
- Create allow rules for required traffic
- Test with policies temporarily disabled

IP Address Conflicts

Problem: IP address conflicts between pods or nodes
Investigation:
- Check pod IP allocations
- Verify CIDR range configurations
- Look for duplicate IP assignments

Solution:
- Restart affected pods
- Verify CIDR range planning
- Contact support for persistent conflicts

External Connectivity Problems

Symptoms

Pods cannot reach external services
DNS resolution failures for external domains
Internet connectivity not working

![Figure needed]

Screenshot showing external connectivity test

Diagnostic Commands

# Test external connectivity
kubectl exec -it test-pod -- curl -I https://google.com

# Check DNS resolution
kubectl exec -it test-pod -- nslookup google.com

# Verify egress rules
kubectl get networkpolicies -A -o yaml | grep -A 10 egress

# Check security groups
# (Review in vCloud interface)

Solutions

DNS Configuration Issues

Problem: DNS not resolving external domains
Investigation:
- Check CoreDNS pods status
- Verify DNS configuration
- Test DNS forwarding

Solution:
- Restart CoreDNS pods
- Verify DNS server configuration
- Check upstream DNS servers

Security Group Restrictions

Problem: Security groups blocking outbound traffic
Investigation:
- Review security group rules
- Check required outbound ports
- Verify rule priorities

Solution:
- Update security group rules
- Allow required outbound traffic
- Test with temporary permissive rules

Network Policy Egress Blocks

Problem: Network policies blocking outbound traffic
Investigation:
- Review egress policies
- Check policy scopes and selectors
- Verify intended egress patterns

Solution:
- Adjust network policies for required egress
- Create specific allow rules
- Test egress requirements

DNS Resolution Issues

Internal DNS Problems

Symptoms

Services cannot be reached by name
DNS resolution timeouts
Inconsistent name resolution

![Figure needed]

Screenshot of DNS resolution failure

Diagnostic Steps

# Test service DNS resolution
kubectl exec -it test-pod -- nslookup kubernetes.default.svc.cluster.local

# Check CoreDNS status
kubectl get pods -n kube-system | grep coredns

# Verify CoreDNS configuration
kubectl get configmap coredns -n kube-system -o yaml

# Test DNS from different namespaces
kubectl exec -it pod-in-namespace-a -- nslookup service-in-namespace-b.namespace-b.svc.cluster.local

Common DNS Issues

CoreDNS Pod Problems

Problem: CoreDNS pods not running or unhealthy
Investigation:
- Check CoreDNS pod status and logs
- Verify resource allocation
- Check for resource pressure

Solution:
- Restart CoreDNS pods
- Increase resource allocation if needed
- Scale CoreDNS replicas if necessary

DNS Configuration Errors

Problem: Incorrect DNS configuration
Investigation:
- Review CoreDNS ConfigMap
- Check DNS forwarding rules
- Verify cluster domain configuration

Solution:
- Correct DNS configuration
- Update CoreDNS ConfigMap
- Restart CoreDNS after changes

Network Policy DNS Blocks

Problem: Network policies blocking DNS traffic
Investigation:
- Check policies affecting kube-system namespace
- Verify DNS port accessibility (53/UDP, 53/TCP)
- Test DNS from different pod contexts

Solution:
- Allow DNS traffic in network policies
- Ensure kube-system namespace accessibility
- Create explicit DNS allow rules

External DNS Problems

Symptoms

Cannot resolve external domain names
Slow external DNS resolution
Partial external DNS functionality

Diagnostic Approach

# Test external DNS resolution
kubectl exec -it test-pod -- nslookup google.com

# Check DNS forwarding configuration
kubectl exec -it test-pod -- cat /etc/resolv.conf

# Test specific DNS servers
kubectl exec -it test-pod -- nslookup google.com 8.8.8.8

# Check DNS traffic
kubectl exec -it test-pod -- tcpdump -i any port 53

Solutions

Upstream DNS Server Issues

Problem: Upstream DNS servers not accessible
Investigation:
- Test connectivity to configured DNS servers
- Verify DNS server IP addresses
- Check for DNS server timeouts

Solution:
- Configure alternative DNS servers
- Update CoreDNS forwarding configuration
- Verify network connectivity to DNS servers

Ingress and Load Balancer Issues

Ingress Not Accessible

Symptoms

External URLs not reachable
Ingress controller not getting external IP
HTTP/HTTPS requests timing out

![Figure needed]

Screenshot showing ingress accessibility issues

Diagnostic Steps

# Check ingress status
kubectl get ingress -A

# Verify ingress controller pods
kubectl get pods -n ingress-nginx

# Check ingress controller service
kubectl get svc -n ingress-nginx

# Test from within cluster
kubectl exec -it test-pod -- curl -H "Host: yourdomain.com" http://INGRESS-CONTROLLER-IP

Common Ingress Issues

Load Balancer Not Provisioned

Problem: External IP stuck in "Pending" state
Investigation:
- Check load balancer provisioning
- Verify vCloud integration
- Review cloud provider quota

Solution:
- Verify load balancer service configuration
- Check vCloud resource availability
- Contact support for provisioning issues

Ingress Controller Problems

Problem: Ingress controller pods unhealthy
Investigation:
- Check pod logs for errors
- Verify resource allocation
- Review configuration

Solution:
- Restart ingress controller pods
- Increase resource allocation
- Fix configuration errors

DNS and Certificate Issues

Problem: SSL/TLS certificate problems
Investigation:
- Check certificate status
- Verify cert-manager operation
- Review DNS configuration

Solution:
- Regenerate certificates
- Fix DNS configuration
- Update cert-manager configuration

Load Balancer Health Issues

Symptoms

Intermittent connectivity through load balancer
Load balancer health checks failing
Uneven traffic distribution

Diagnostic Approach

Check Backend Health: Review ingress controller backend status
Monitor Statistics: Use HAProxy statistics for load balancer analysis
Test Connectivity: Direct tests to backend services
Review Configuration: Verify load balancer configuration

Solutions

Backend Health Problems

Problem: Ingress controller backends marked unhealthy
Investigation:
- Check ingress controller pod health
- Verify health check endpoints
- Review resource constraints

Solution:
- Fix pod health issues
- Adjust health check parameters
- Scale ingress controller if needed

Performance Issues

Network Performance Problems

Symptoms

Slow network connectivity
High network latency
Bandwidth limitations

![Figure needed]

Screenshot of network performance monitoring

Performance Testing

# Test network bandwidth
kubectl run network-test --image=networkstatic/iperf3 --rm -it -- iperf3 -c TARGET-IP

# Test latency
kubectl exec -it test-pod -- ping -c 10 TARGET-IP

# Monitor network usage
kubectl top nodes
kubectl top pods -A

Optimization Solutions

Resource Allocation

Problem: Insufficient network resources
Investigation:
- Monitor node network utilization
- Check CNI plugin resource usage
- Review network bandwidth allocation

Solution:
- Increase node network capacity
- Optimize CNI plugin configuration
- Distribute network load

Configuration Optimization

Problem: Suboptimal network configuration
Investigation:
- Review CNI plugin settings
- Check ingress controller configuration
- Analyze traffic patterns

Solution:
- Tune CNI plugin parameters
- Optimize ingress controller settings
- Implement traffic optimization

Getting Support

Information to Gather

Before contacting support, collect:

Network Configuration

Cluster Network Settings: CNI plugin, CIDR ranges
VPC Network Configuration: Network interfaces and settings
Security Groups: Applied security group rules
Ingress Configuration: Ingress controller and service settings

Error Information

Error Messages: Exact error text and codes
Log Extracts: Relevant log entries from affected components
Timeline: When issues started and any recent changes
Impact Scope: Which services or users are affected

Diagnostic Output

# Network diagnostics to include
kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl get svc -A
kubectl get ingress -A
kubectl get networkpolicies -A

# Component status
kubectl get pods -n kube-system
kubectl get pods -n ingress-nginx

# Recent events
kubectl get events -A --sort-by='.lastTimestamp' | tail -20

Support Escalation

Level 1: Documentation and Basic Troubleshooting

Review Guides: Check relevant troubleshooting sections
Basic Tests: Perform basic connectivity and DNS tests
Configuration Review: Verify network configuration settings

Level 2: Advanced Diagnostics

Log Analysis: Detailed analysis of component logs
Performance Testing: Network performance and latency testing
Configuration Validation: Deep dive into network configurations

Level 3: Expert Support

Complex Issues: Multi-component networking problems
Performance Optimization: Advanced performance tuning
Integration Issues: vCloud integration problems

Prevention and Best Practices

Proactive Monitoring

Network Health: Regular monitoring of network component health
Performance Metrics: Continuous monitoring of network performance
Capacity Planning: Monitor network capacity and plan for growth
Alert Configuration: Set up alerts for network issues

Configuration Management

Documentation: Maintain network configuration documentation
Change Control: Implement change control for network changes
Testing: Test network changes in non-production environments
Backup: Backup network configurations

Regular Maintenance

Component Updates: Keep network components updated
Performance Review: Regular performance review and optimization
Security Review: Regular security configuration review
Cleanup: Regular cleanup of unused network resources

For additional cluster troubleshooting, see the Cluster Management Troubleshooting section.

Connectivity Issues​

Pod-to-Pod Communication Problems​

Symptoms​

Diagnostic Steps​

Common Causes and Solutions​

External Connectivity Problems​

Symptoms​

Diagnostic Commands​

Solutions​

DNS Resolution Issues​

Internal DNS Problems​

Symptoms​

Diagnostic Steps​

Common DNS Issues​

External DNS Problems​

Symptoms​

Diagnostic Approach​

Solutions​

Ingress and Load Balancer Issues​

Ingress Not Accessible​

Symptoms​

Diagnostic Steps​

Common Ingress Issues​

Load Balancer Health Issues​

Symptoms​

Diagnostic Approach​

Solutions​

Performance Issues​

Network Performance Problems​

Symptoms​

Performance Testing​

Optimization Solutions​

Getting Support​

Information to Gather​

Network Configuration​

Error Information​

Diagnostic Output​

Support Escalation​

Level 1: Documentation and Basic Troubleshooting​

Level 2: Advanced Diagnostics​

Level 3: Expert Support​

Prevention and Best Practices​

Proactive Monitoring​

Configuration Management​

Regular Maintenance​

Connectivity Issues

Pod-to-Pod Communication Problems

Symptoms

Diagnostic Steps

Common Causes and Solutions

External Connectivity Problems

Symptoms

Diagnostic Commands

Solutions

DNS Resolution Issues

Internal DNS Problems

Symptoms

Diagnostic Steps

Common DNS Issues

External DNS Problems

Symptoms

Diagnostic Approach

Solutions

Ingress and Load Balancer Issues

Ingress Not Accessible

Symptoms

Diagnostic Steps

Common Ingress Issues

Load Balancer Health Issues

Symptoms

Diagnostic Approach

Solutions

Performance Issues

Network Performance Problems

Symptoms

Performance Testing

Optimization Solutions

Getting Support

Information to Gather

Network Configuration

Error Information

Diagnostic Output

Support Escalation

Level 1: Documentation and Basic Troubleshooting

Level 2: Advanced Diagnostics

Level 3: Expert Support

Prevention and Best Practices

Proactive Monitoring

Configuration Management

Regular Maintenance