Skip to main content

Metrics Dashboard

The metrics dashboard provides real-time cluster monitoring with comprehensive resource analytics and automatic data refresh capabilities.

Accessing the Dashboard

  1. Go to the cluster details page
  2. Click on the "Metrics" tab in the cluster navigation
  3. The dashboard loads with real-time data and begins auto-refreshing

![Figure needed]

Screenshot of cluster navigation with Metrics tab highlighted

Dashboard Loading

  • Initial Load: Dashboard loads current metrics data
  • Auto-refresh Start: 30-second refresh timer begins automatically
  • Connection Status: Visual indicators show data connection status

Auto-refresh Functionality

Automatic Updates

  • Refresh Interval: Data updates every 30 seconds automatically
  • Background Process: Updates occur without interrupting viewing
  • Visual Indicators: Subtle indicators during data refresh
  • Seamless Experience: No page reload or interface disruption

![Figure needed]

Screenshot showing auto-refresh indicator and last updated timestamp

Manual Refresh Controls

  • Refresh Button: Click to force immediate data update
  • Last Updated: Timestamp shows when data was last refreshed
  • Status Indicators: Visual cues for data freshness and connectivity

Connection Management

  • Error Handling: Graceful handling of temporary connection issues
  • Retry Logic: Automatic retry for failed refresh attempts
  • Status Display: Clear indication of connection problems

Cluster Resource Overview

Summary Cards

The dashboard begins with high-level cluster statistics:

![Figure needed]

Screenshot of summary cards showing total nodes and running pods

Total Nodes Card

  • Node Count: Total number of nodes in the cluster
  • Status Breakdown: Visual indicators for node health states
  • Quick Stats: Ready vs. total node count display
  • Health Overview: Overall node fleet health indication

Running Pods Card

  • Pod Count: Total number of running pods across all nodes
  • Status Distribution: Pod state breakdown (Running, Pending, Failed)
  • Health Indicators: Overall pod health across the cluster
  • Capacity Information: Pod allocation efficiency

Resource Utilization Display

CPU Utilization

  • Overall Percentage: Cluster-wide CPU usage as a percentage
  • Visual Progress Bar: Color-coded utilization indicator
    • Green (0-70%): Healthy utilization range
    • Orange (70-85%): Warning range, monitor closely
    • Red (85%+): Critical range, action required
  • Usage Details: Current usage vs. total allocatable capacity
  • Trend Indication: Visual indicators for usage direction

![Figure needed]

Screenshot of CPU utilization display with color-coded progress bar

Memory Utilization

  • Overall Percentage: Cluster-wide memory usage as a percentage
  • Visual Progress Bar: Same color coding scheme as CPU
  • Memory Details: Used vs. total available memory
  • Unit Conversion: Automatic unit display (GB, MB, etc.)
  • Capacity Tracking: Available vs. allocated memory breakdown

![Figure needed]

Screenshot of memory utilization display with detailed breakdown

Data Accuracy and Timing

Real-time Data Collection

  • Source: Data collected from Kubernetes metrics server
  • Processing: Real-time calculation of cluster-wide statistics
  • Aggregation: Intelligent aggregation across all cluster nodes
  • Validation: Data validation and error checking

Update Frequency

  • Metrics Collection: Kubernetes metrics updated every 15 seconds
  • Dashboard Refresh: Interface updates every 30 seconds
  • Data Latency: Typical 15-45 second delay from actual usage
  • Accuracy: High accuracy for capacity planning and monitoring

Dashboard Interface Elements

Layout Structure

  • Header Section: Summary cards and quick statistics
  • Main Content: Detailed metrics and analytics
  • Navigation: Integrated with cluster navigation structure
  • Footer: Status information and last update timestamp

Interactive Elements

  • Clickable Cards: Summary cards may link to detailed views
  • Hover Information: Additional details on mouse hover
  • Status Tooltips: Explanatory tooltips for status indicators
  • Refresh Controls: Manual refresh and status controls

Visual Design

  • Color Consistency: Consistent color scheme throughout
  • Progress Indicators: Visual representation of utilization
  • Status Icons: Intuitive iconography for different states
  • Typography: Clear, readable font sizes and hierarchy

Data Interpretation

Understanding Utilization Percentages

CPU Utilization Ranges:
0-30%: Low utilization, potential for consolidation
30-70%: Optimal utilization range
70-85%: High utilization, monitor for scaling needs
85%+: Critical utilization, immediate action required

Memory Utilization Ranges:
0-40%: Low utilization, potential for consolidation
40-75%: Good utilization range
75-90%: High utilization, plan for scaling
90%+: Critical utilization, urgent scaling needed

Resource Planning Guidelines

  • Scale Up Triggers: CPU over 80% or Memory over 85% sustained
  • Scale Down Opportunities: CPU under 30% and Memory under 40% sustained
  • Optimization Targets: 50-70% CPU, 60-80% Memory utilization
  • Capacity Buffer: Maintain 20-30% headroom for spikes

Use Cases

Daily Health Checks

  1. Morning Review: Check overall cluster health
  2. Resource Status: Verify resource utilization is within normal ranges
  3. Trend Awareness: Notice any significant changes from previous day
  4. Issue Identification: Spot potential problems early

Capacity Planning

  1. Usage Trends: Monitor resource usage patterns over time
  2. Growth Planning: Plan for additional capacity based on trends
  3. Cost Optimization: Identify opportunities for resource optimization
  4. Scaling Decisions: Make informed decisions about cluster scaling

Performance Monitoring

  1. Bottleneck Identification: Find resource constraints
  2. Workload Analysis: Understand cluster workload characteristics
  3. Optimization Opportunities: Identify areas for improvement
  4. SLA Monitoring: Ensure resource availability meets requirements

Best Practices

Regular Monitoring

  1. Consistent Schedule: Check dashboard at regular intervals
  2. Trend Awareness: Look for patterns and changes over time
  3. Documentation: Record significant observations and changes
  4. Alert Setup: Configure alerts for critical thresholds

Data Utilization

  1. Baseline Understanding: Establish normal operating ranges
  2. Seasonal Patterns: Recognize cyclical usage patterns
  3. Anomaly Detection: Quickly identify unusual behavior
  4. Correlation Analysis: Correlate metrics with application behavior

Proactive Management

  1. Early Warning: Use metrics for early problem detection
  2. Preventive Scaling: Scale before reaching critical thresholds
  3. Resource Planning: Plan capacity changes based on data
  4. Cost Management: Optimize resource allocation for cost efficiency

Next: Learn about Node Health Monitoring for detailed node-level metrics.