Requirement Checklist
Software Requirements
Ensure the following command-line tools (CLIs) are installed:
CLI | Version |
---|---|
kubectl | v1.27.3 |
helmfile | v0.142.0 |
helm | v3.9.3 |
Cloud Requirements
- Microsoft Azure
- Google Cloud Platfom
- Amazon Web Services
1. Azure Subscription and Resource Group
Make sure to have an active Azure subscription and a designated resource group.
2. Azure Service Principal
Create an Azure service principal with admin access to ADLS2, AKS, ACR, and LAW.
3. Azure Kubernetes Service (AKS)
Configure Azure Kubernetes Service with the following recommendations:
- Namespace:
yeedu
(Recommended) - Cluster Version:
v1.26.10
- Number of Nodes: 2
- Min CPU: 16
- Min Memory: 32GB
- Machine Type: Standard_D4as_v5 (Recommended)
4. Azure Data Lake Gen 2 Blob Storage (ADLS2)
Set up an Azure Data Lake Gen 2 Storage account. Ensure the service principal from Step 2 has the necessary read and write permissions.
5. Azure PostgreSQL
Deploy an Azure PostgreSQL instance with the following configurations:
- CPU: 8 CPUs
- Memory: 32GB RAM
- Machine Type: Standard_D8_v4
- Persistent Disk: 500GB PremiumV2_LRS (SSD)
- Backup Policy: Everyday (For Production)
- High Availability and DR: Recommended for Production
6. Azure Container Registry (ACR)
Create an Azure Container Registry for storing Docker images. Refer to setup-docker-registry section for a list of container registries to be created and instructions on uploading Yeedu images.
7. Log Analytics Workspace
Set up a Log Analytics Workspace to store logs. Obtain the LAW_ID and LAW_SECRET values. Ensure the service principal from Step 2 has the necessary write access.
8. Firewalls
Ensure the following ports are open:
Service Name | Port | Access From Workstation | Access From Apache Spark Cluster | Access From Yeedu Control Plane Server |
---|---|---|---|---|
REST API | 8080 | Yes | Yes | Yes |
Jupyter Notebook | 8888-9088 | No | No | Yes |
RabbitMQ UI | 15672 | Yes | No | No |
Redis | 6379 | No | No | Yes |
RabbitMQ | 5672 | No | Yes | Yes |
PostgreSQL | 5432 | Yes | Yes | Yes |
History-server | 10000 | Yes | No | Yes |
Grafana | 3000 | Yes | Yes | Yes |
Influx DB | 8086 | No | Yes | Yes |
LDAP | 389 | No | No | Yes |
Make sure to fulfill these prerequisites for a successful Yeedu installation.
Azure File Share prerequisites (NFS only)
- Create an Azure Storage Account with Premium tier and NFS protocol enabled.
- Configure a Private Endpoint to ensure secure access.
- Access Control: Ensure the private endpoint is accessible.
1. Google Cloud Platform (GCP) project
Make sure to have an active GCP project.
2. Google Cloud Platform (GCP) Serivce Account
Create a service account with following permissions: Compute Engine Admin, VPC Reader, GCR Reader, Storage Object Admin, Kubernetes admin
3. Google Kubernetes Engine (GKE)
Configure Azure Kubernetes Service with the following recommendations:
- Namespace:
yeedu
(Recommended) - Cluster Version:
v1.26.10
- Number of Nodes: 2
- Min CPU: 16
- Min Memory: 32GB
4. Google Cloud Storage (GCS)
Create a bucket in the same project with Storage Object Admin Access. Ensure the service principal from Step 2 has the necessary read and write permissions.
5. Google Cloud SQL (PostgreSQL)
Deploy an Google Cloud SQL (PostgreSQL) instance with the following configurations:
- CPU: 8 CPUs
- Memory: 32GB RAM
- Database Name: yeedu
- Persistent Disk: 500GB (pd-ssd)
- Backup Policy: Everyday (For Production)
- High Availability and DR: Recommended for Production
6. Google Container Registry (GCR)
Create an Google Container Registry (GCR) for storing Docker images. Refer to setup-docker-registry section for a list of container registries to be created and instructions on uploading Yeedu images.
7. Google Stackdriver
Set up a Google Stackdriver to store logs. Ensure the service principal from Step 2 has the necessary write access.
8. Firewalls
Ensure the following ports are open:
Service Name | Port | Access From Workstation | Access From Apache Spark Cluster | Access From Yeedu Control Plane Server |
---|---|---|---|---|
REST API | 8080 | Yes | Yes | Yes |
Jupyter Notebook | 8888-9088 | No | No | Yes |
RabbitMQ UI | 15672 | Yes | No | No |
Redis | 6379 | No | No | Yes |
RabbitMQ | 5672 | No | Yes | Yes |
PostgreSQL | 5432 | Yes | Yes | Yes |
History-server | 10000 | Yes | No | Yes |
Grafana | 3000 | Yes | Yes | Yes |
Influx DB | 8086 | No | Yes | Yes |
LDAP | 389 | No | No | Yes |
Make sure to fulfill these prerequisites for a successful Yeedu installation.
Google Cloud Filestore prerequisites
- Provision a Filestore instance within a Google Cloud project.
- Ensure the Filestore instance resides in the same VPC network and subnet as the connected resources.
- Access Control: Grant admin access to the listed ip ranges.
1. Amazon Web Serivces (AWS) account
Make sure to have an active AWS account.
2. AWS Access & Secret Key Pair
Create a AWS Access & Secret Key Pair with following permissions: Create EC2 machines, Read VPC, Storage Object Admin, Read and Write access to ECR, CloudWatch Log Writer
3. Elastic Kubernetes Services (EKS)
Configure Elastic Kubernetes Service with the following recommendations:
- Namespace:
yeedu
(Recommended) - Cluster Version:
v1.26.10
- Number of Nodes: 2
- Min CPU: 16
- Min Memory: 32GB
4. S3 Storage Bucket
Create an S3 Storage Bucket for storing configuration files and logs. Ensure the service principal from Step 2 has the necessary read and write permissions.
5. AWS RDS (PostgreSQL)
Deploy an AWS RDS (PostgreSQL) instance with the following configurations:
- CPU: 8 CPUs
- Memory: 32GB RAM
- Database Name: yeedu
- Persistent Disk: 500GB
- Backup Policy: Everyday (For Production)
- High Availability and DR: Recommended for Production
6. Elastic Container Registry (ECR)
Create an Elastic Container Registry (ECR) for storing Docker images. Refer to setup-docker-registry section for a list of container registries to be created and instructions on uploading Yeedu images.
7. AWS CloudWatch
Set up a AWS CloudWatch to view machine logs, create log groups in CloudWatch. Ensure the service principal from Step 2 has the necessary write access.
8. Firewalls
Ensure the following ports are open:
Service Name | Port | Access From Workstation | Access From Apache Spark Cluster | Access From Yeedu Control Plane Server |
---|---|---|---|---|
REST API | 8080 | Yes | Yes | Yes |
Jupyter Notebook | 8888-9088 | No | No | Yes |
RabbitMQ UI | 15672 | Yes | No | No |
Redis | 6379 | No | No | Yes |
RabbitMQ | 5672 | No | Yes | Yes |
PostgreSQL | 5432 | Yes | Yes | Yes |
History-server | 10000 | Yes | No | Yes |
Grafana | 3000 | Yes | Yes | Yes |
Influx DB | 8086 | No | Yes | Yes |
LDAP | 389 | No | No | Yes |
Make sure to fulfill these prerequisites for a successful Yeedu installation.
EFS prerequisites
- Create an EFS file system and corresponding mount targets.
- EC2 instance and EFS mount targets should be in the same VPC or connected VPCs to facilitate network communication.
- Allow inbound traffic on port 2049 from EC2 instances to the EFS mount targets.
- Availability Zone : regional
- Permissions : elasticfilesystem:ClientRootAccess, elasticfilesystem:ClientWrite, elasticfilesystem:ClientMount