Skip to main content
Version: v2.9.0

Yeedu High Availability

Yeedu High Availability Introduction

High Availability (HA) is a cornerstone of Yeedu’s architecture, ensuring uninterrupted service, fault tolerance, and operational resilience. This document outlines how Yeedu leverages cloud-native infrastructure, Kubernetes orchestration, and service-level redundancy to deliver enterprise-grade HA across all core components.


Overview

Yeedu is designed from the ground up with high availability in mind. All cloud and internal components are structured to maintain service continuity, enable automatic recovery, and deliver optimal performance during peak loads or unexpected failures.


Recovery Objectives

Recovery Time Objective (RTO)

Yeedu’s services are engineered to achieve an RTO of < 5 minutes for core services. Most services are configured with auto-healing policies that restart or replace failed pods in 2 to 5 seconds, ensuring near-zero downtime.

Recovery Point Objective (RPO)

Yeedu targets an RPO of ≤ 10 seconds for critical data, thanks to its continuous log synchronization and frequent cache replication mechanisms.


Cloud Infrastructure Components

The foundational cloud infrastructure components that enable high availability for Yeedu services include:

ServiceDescriptionHA Strategy
PostgresPrimary backend databaseUses 2 databases for high availability and failover.
Object StoragePersistent data lake for logs and jobsUses cloud-native HA object store.
NFSShared file system across servicesMounted via HA-enabled network drives.
KubernetesCloud-hosted orchestration engine. All Yeedu services run on Kubernetes.Built-in self-healing, auto-scaling, probing.

Each of these components is set up in the cloud with high availability features enabled during resource provisioning.


Core Yeedu Internal Services

Yeedu’s high availability and distributed architecture are powered by the following key internal service components:

1. Log Synchronizer

  • Purpose: Synchronizes application and platform logs to Object Storage every 10 seconds.
  • HA Mechanism:
    • Kubernetes Liveness and Readiness probes ensure service health.
    • On failure, a new replica is spun up within 2–5 seconds, minimizing downtime.
    • Horizontal Pod Autoscaling (HPA) ensures scalability during load surges.
  • Replicas: Min: 1, Max: 1
  • Init Time (Approx.): 3–5 seconds

2. Yeedu Broker

  • Purpose: Coordinates distributed state and message flow.
  • Architecture: Based on Raft consensus protocol (leader-election model).
  • HA Mechanism:
    • Seamless failover from leader to follower nodes.
    • Automatic node recovery via healing policies.
    • Resilience behavior similar to Kafka under node failures.

3. Yeedu Cache

  • Purpose: High-speed data access layer.
  • Architecture: Master-replica distributed model.
  • HA Mechanism:
    • Data replication across a minimum of 3 replicas.
    • Ensures redundancy and availability of in-memory state.

4. Yeedu Monitor Dashboard

  • Purpose: UI for system metrics and diagnostics.
  • Backend: Postgres (HA-configured).
  • HA Mechanism:
    • Minimum of 3 dashboard service replicas, with autoscaling based on user activity.
    • Relies on HA configuration of the Postgres DB.
  • Replicas: Min: 1, Max: 1
  • Init Time (Approx.): 3–5 seconds

5. Yeedu UI

  • Purpose: End-user interface.
  • HA Mechanism:
    • Default minimum of 3 replicas, maximum of 5.
    • Scales horizontally once CPU utilization reaches 65%.
    • Kubernetes self-healing ensures prompt recovery in failure scenarios.

6. Yeedu REST API

  • Purpose: Central API layer interfacing with Broker, Cache, Vault.
  • HA Mechanism:
    • Maintains 3–5 replicas via autoscaling.
    • Scaling triggered on CPU threshold (65%).
    • Readiness/liveness probes and self-healing ensure uninterrupted access.
  • Replicas: Min: 1, Max: 3
  • Init Time (Approx.): 3–5 seconds

7. Yeedu Vault

  • Purpose: Secure secret management and storage.
  • Backend: Postgres (HA-enabled).
  • HA Mechanism:
    • Inherits HA from Postgres backend.
    • Ensures secrets are always available and accessible securely.
  • Replicas: Min: 3, Max: 3
  • Init Time (Approx.): 8–10 seconds

8. Yeedu History Server

  • Purpose: Stores job and pipeline execution histories.
  • Backend: Object Storage.
  • HA Mechanism:
    • Configurable with 1 to 3 replicas.
    • Automatically scaled based on usage and data inflow.
  • Replicas: Min: 1, Max: 1
  • Init Time (Approx.): 3–5 seconds

9. Yeedu Reactors

  • Purpose: Executes backend compute functions and jobs.
  • Dependencies: Broker, Cache, Vault.
  • HA Mechanism:
    • Redundant replicas maintained.
    • Scaling and recovery based on health checks and load patterns.

10. Yeedu Functions Scheduler

  • Purpose: Schedules backend compute functions and jobs.
  • HA Mechanism:
    • Ensures redundancy through autoscaling.
    • Scaling and recovery based on health checks and load patterns.
  • Replicas: Min: 1, Max: 1
  • Init Time (Approx.): 3–5 seconds

yeedu_clouds

Kubernetes-Level HA Assurance

All Yeedu components are deployed within a Kubernetes Cluster, providing the following native HA capabilities:

  • Liveness & Readiness Probes: Actively monitor service health and responsiveness.
  • Pod Auto-Healing: Automatically restarts failed containers.
  • Horizontal Pod Autoscaler (HPA): Dynamically adjusts replicas based on CPU/memory usage.
  • State Management: Ensures the desired replica count is always maintained.

Final Kubernetes Component Configuration

Below is the configuration for final Kubernetes components in Yeedu:

Component NameMin ReplicasMax ReplicasInit Time (Approx.)
yeedu-rabbitmq3331 minute
yeedu-ldap115 seconds
yeedu-redis1110 seconds
yeedu-grafana113–5 seconds
yeedu-influxdb113–5 seconds
yeedu-reactors-cosi113–5 seconds
yeedu-reactors-monitor113–5 seconds
yeedu-restapi133–5 seconds
yeedu-reactors-log-sync113–5 seconds
yeedu-history-server113–5 seconds
yeedu-functions-scheduler113–5 seconds
yeedu-functions-celery223–5 seconds
yeedu-functions-proxy223–5 seconds
yeedu-vault338–10 seconds

Recovery Timelines

The table below summarizes how each Yeedu component recovers automatically using HA mechanisms and their typical recovery times:

ComponentHA MechanismRecovery Time
Kubernetes PodsAuto-restart via probes2–5 seconds
Log SynchronizerAuto-restart + pod scaling2–5 seconds
Yeedu BrokerRaft-based leader failoverImmediate
Yeedu CacheMaster-replica fallback2–5 seconds
Monitor DashboardAutoscaled replicas + HA Postgres2–5 seconds
Yeedu UIAutoscaling + self-healing< 5 seconds
Yeedu REST APIAutoscaling + self-healing< 5 seconds
Yeedu VaultBacked by HA PostgresTransparent
Yeedu History ServerObject Storage + scaled replicas2–5 seconds
Yeedu ReactorsRedundant replicas + health checks2–5 seconds
Postgres / Object Store / NFSCloud-native HA setupTransparent

Total Recovery Time: ~5 minutes or less across all services


Conclusion

Yeedu’s architecture leverages:

  • Cloud-native HA infrastructure,
  • Microservice redundancy,
  • Kubernetes orchestration features,
  • Consensus algorithms (Raft for Broker),
  • Dynamic autoscaling policies.

This end-to-end approach ensures fault tolerance, scalability, and enterprise-grade availability for mission-critical deployments.