We are seeking a Senior Site Reliability Engineer to provide reliability to the cloud infrastructure that enables Lucid Motors' cloud-based services and internal systems.
Requirements
- Provide Reliability Engineering to cloud services deployed and managed in the region of KSA.
- Continuous delivery (CI/CD) using ArgoCD, Jenkins, Maven, and Docker.
- Architect cloud systems in highly available design, ensuring Disaster Recovery (DR) measures are in place.
- Containerization and deployment of microservices and data pipeline on Kubernetes using Helm installation.
- Auto-scale and monitoring performance for Kubernetes and running applications using Prometheus and Grafana or similar tools.
- Performing SRE activities such as availability and reliability monitoring and reports.
- Deploy, configure and maintain tools such as Kafka, Spark, Trino, Airflow, MQTT, and Microservices.
- Setting up infrastructure as a service using Terraform.
- Work and deploy using the codebase repositories in GitLab, along with participation in the peer review activities.
- Support No-SQL databases such as Elastic Search, Mongo, Cassandra, and other open sources services.
- Setup and monitor various applications and services.
- Continuously enhance the alerts and automate the recovery process.
- Articipate in the on-call rotation to keep up the service SLA per the business needs.
- Work with Product Owners, engineering managers, and other team members in Agile Scrum and Kanban mode.
- Take appropriate actions by doing impact analysis during the incidents.
Benefits
- Medical
- Dental
- Vision
- Life insurance
- Disability insurance
- Vacation
- 401k