Lyrebird Studio
Cloud DevOps Engineer
- Built services on NestJS to expose training/inference workloads to millions of requests on a daily basis. Mainly used AWS ECS, Elasticache with Redis, Amazon DynamoDB, AWS Lambda, SQS.
- Built a PoC for migration of ML workloads to Kubernetes for efficient GPU and resource allocation and utilization, which resulted in up to 40% cost reduction. Used Karpenter, Keda, Bottlerocket, ArgoCD, Bottlerocket.
- Configured a Kubernetes cluster for our self-hosted CI/CD runners and integrated with a caching mechanism to reduce build/deploy times by 50% and costs by 60%.
- Used S3 Express One Zone storage class when it first came out to gain up to 90% performance improvement on storage on applications serving users below second latency. Blog post can be read here.
- Worked with the research and development team, which develops high-quality machine learning models serving millions of users worldwide, to create a cloud-based working environment with powerful accelerators (GPUs, TPUs, and Inf2), and deploy those models to production using seamless infrastructures.
- Developed advanced observability, monitoring, and alerting mechanisms on AWS, and integrated them with Slack for real-time notifications and incident management. Also utilized Incident Manager, Amazon GuardDuty, AWS WAF to provide comprehensive threat detection, mitigation, and automated incident response capabilities across the cloud infrastructure.
- GenAI
- AWS
- Python
- TypeScript
- GCP