Version: v2.9.0

Yeedu High Availability

Yeedu High Availability Introduction

High Availability (HA) is a cornerstone of Yeedu’s architecture, ensuring uninterrupted service, fault tolerance, and operational resilience. This document outlines how Yeedu leverages cloud-native infrastructure, Kubernetes orchestration, and service-level redundancy to deliver enterprise-grade HA across all core components.

Overview

Yeedu is designed from the ground up with high availability in mind. All cloud and internal components are structured to maintain service continuity, enable automatic recovery, and deliver optimal performance during peak loads or unexpected failures.

Recovery Objectives

Recovery Time Objective (RTO)

Yeedu’s services are engineered to achieve an RTO of < 5 minutes for core services. Most services are configured with auto-healing policies that restart or replace failed pods in 2 to 5 seconds, ensuring near-zero downtime.

Recovery Point Objective (RPO)

Yeedu targets an RPO of ≤ 10 seconds for critical data, thanks to its continuous log synchronization and frequent cache replication mechanisms.

Cloud Infrastructure Components

The foundational cloud infrastructure components that enable high availability for Yeedu services include:

Service	Description	HA Strategy
Postgres	Primary backend database	Uses 2 databases for high availability and failover.
Object Storage	Persistent data lake for logs and jobs	Uses cloud-native HA object store.
NFS	Shared file system across services	Mounted via HA-enabled network drives.
Kubernetes	Cloud-hosted orchestration engine. All Yeedu services run on Kubernetes.	Built-in self-healing, auto-scaling, probing.

Each of these components is set up in the cloud with high availability features enabled during resource provisioning.

Core Yeedu Internal Services

Yeedu’s high availability and distributed architecture are powered by the following key internal service components:

1. Log Synchronizer

Purpose: Synchronizes application and platform logs to Object Storage every 10 seconds.
HA Mechanism:
- Kubernetes Liveness and Readiness probes ensure service health.
- On failure, a new replica is spun up within 2–5 seconds, minimizing downtime.
- Horizontal Pod Autoscaling (HPA) ensures scalability during load surges.
Replicas: Min: 1, Max: 1
Init Time (Approx.): 3–5 seconds

2. Yeedu Broker

Purpose: Coordinates distributed state and message flow.
Architecture: Based on Raft consensus protocol (leader-election model).
HA Mechanism:
- Seamless failover from leader to follower nodes.
- Automatic node recovery via healing policies.
- Resilience behavior similar to Kafka under node failures.

3. Yeedu Cache

Purpose: High-speed data access layer.
Architecture: Master-replica distributed model.
HA Mechanism:
- Data replication across a minimum of 3 replicas.
- Ensures redundancy and availability of in-memory state.

4. Yeedu Monitor Dashboard

Purpose: UI for system metrics and diagnostics.
Backend: Postgres (HA-configured).
HA Mechanism:
- Minimum of 3 dashboard service replicas, with autoscaling based on user activity.
- Relies on HA configuration of the Postgres DB.
Replicas: Min: 1, Max: 1
Init Time (Approx.): 3–5 seconds

5. Yeedu UI

Purpose: End-user interface.
HA Mechanism:
- Default minimum of 3 replicas, maximum of 5.
- Scales horizontally once CPU utilization reaches 65%.
- Kubernetes self-healing ensures prompt recovery in failure scenarios.

6. Yeedu REST API

Purpose: Central API layer interfacing with Broker, Cache, Vault.
HA Mechanism:
- Maintains 3–5 replicas via autoscaling.
- Scaling triggered on CPU threshold (65%).
- Readiness/liveness probes and self-healing ensure uninterrupted access.
Replicas: Min: 1, Max: 3
Init Time (Approx.): 3–5 seconds

7. Yeedu Vault

Purpose: Secure secret management and storage.
Backend: Postgres (HA-enabled).
HA Mechanism:
- Inherits HA from Postgres backend.
- Ensures secrets are always available and accessible securely.
Replicas: Min: 3, Max: 3
Init Time (Approx.): 8–10 seconds

8. Yeedu History Server

Purpose: Stores job and pipeline execution histories.
Backend: Object Storage.
HA Mechanism:
- Configurable with 1 to 3 replicas.
- Automatically scaled based on usage and data inflow.
Replicas: Min: 1, Max: 1
Init Time (Approx.): 3–5 seconds

9. Yeedu Reactors

Purpose: Executes backend compute functions and jobs.
Dependencies: Broker, Cache, Vault.
HA Mechanism:
- Redundant replicas maintained.
- Scaling and recovery based on health checks and load patterns.

10. Yeedu Functions Scheduler

Purpose: Schedules backend compute functions and jobs.
HA Mechanism:
- Ensures redundancy through autoscaling.
- Scaling and recovery based on health checks and load patterns.
Replicas: Min: 1, Max: 1
Init Time (Approx.): 3–5 seconds

yeedu_clouds

Kubernetes-Level HA Assurance

All Yeedu components are deployed within a Kubernetes Cluster, providing the following native HA capabilities:

Liveness & Readiness Probes: Actively monitor service health and responsiveness.
Pod Auto-Healing: Automatically restarts failed containers.
Horizontal Pod Autoscaler (HPA): Dynamically adjusts replicas based on CPU/memory usage.
State Management: Ensures the desired replica count is always maintained.

Final Kubernetes Component Configuration

Below is the configuration for final Kubernetes components in Yeedu:

Component Name	Min Replicas	Max Replicas	Init Time (Approx.)
yeedu-rabbitmq3	3	3	1 minute
yeedu-ldap	1	1	5 seconds
yeedu-redis	1	1	10 seconds
yeedu-grafana	1	1	3–5 seconds
yeedu-influxdb	1	1	3–5 seconds
yeedu-reactors-cosi	1	1	3–5 seconds
yeedu-reactors-monitor	1	1	3–5 seconds
yeedu-restapi	1	3	3–5 seconds
yeedu-reactors-log-sync	1	1	3–5 seconds
yeedu-history-server	1	1	3–5 seconds
yeedu-functions-scheduler	1	1	3–5 seconds
yeedu-functions-celery	2	2	3–5 seconds
yeedu-functions-proxy	2	2	3–5 seconds
yeedu-vault	3	3	8–10 seconds

Recovery Timelines

The table below summarizes how each Yeedu component recovers automatically using HA mechanisms and their typical recovery times:

Component	HA Mechanism	Recovery Time
Kubernetes Pods	Auto-restart via probes	2–5 seconds
Log Synchronizer	Auto-restart + pod scaling	2–5 seconds
Yeedu Broker	Raft-based leader failover	Immediate
Yeedu Cache	Master-replica fallback	2–5 seconds
Monitor Dashboard	Autoscaled replicas + HA Postgres	2–5 seconds
Yeedu UI	Autoscaling + self-healing	< 5 seconds
Yeedu REST API	Autoscaling + self-healing	< 5 seconds
Yeedu Vault	Backed by HA Postgres	Transparent
Yeedu History Server	Object Storage + scaled replicas	2–5 seconds
Yeedu Reactors	Redundant replicas + health checks	2–5 seconds
Postgres / Object Store / NFS	Cloud-native HA setup	Transparent

Total Recovery Time: ~5 minutes or less across all services

Conclusion

Yeedu’s architecture leverages:

Cloud-native HA infrastructure,
Microservice redundancy,
Kubernetes orchestration features,
Consensus algorithms (Raft for Broker),
Dynamic autoscaling policies.

This end-to-end approach ensures fault tolerance, scalability, and enterprise-grade availability for mission-critical deployments.