DevOps Engineer
Role Overview
We're seeking an exceptional DevOps Engineer to take ownership of the cloud infrastructure and deployment systems that power our mission in applied atmospheric science. In this role, you'll be the technical foundation our scientists, ML engineers, and software engineers build on — designing and operating the cloud environments, deployment pipelines, and infrastructure-as-code that keep our systems reliable, scalable, and fast.
What You'll Do
- Own and operate our cloud infrastructure across AWS, GCP, and other compute providers, ensuring high availability, cost efficiency, and security
- Lead the development and maintenance of Terraform-based infrastructure-as-code across all environments
- Design and implement CI/CD pipelines that enable rapid, reliable delivery of our modeling products
- Guide architectural decisions for cloud deployments, establishing best practices and patterns for the engineering team
- Manage containerized workloads, including Docker image builds, container registries, and orchestration via AWS
- Build and maintain scalable compute environments for HPC and GPU workloads, including cluster management and job scheduling
- Implement observability across our infrastructure stack
- Partner with software engineers and scientists to optimize data pipeline infrastructure for processing massive meteorological datasets
- Drive security, compliance, and cost governance across our cloud footprint
Requirements
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent practical experience)
- Strong hands-on experience with AWS and GCP, including core services (compute, networking, storage, IAM)
- Deep proficiency with Terraform for infrastructure-as-code at production scale
- Experience designing and operating CI/CD systems (GitHub Actions, GitLab CI, or similar)
- Solid understanding of containerization and orchestration (Docker, Kubernetes, or equivalent)
- Experience managing networking, security groups, VPCs, and IAM policies in multi-cloud environments
- Familiarity with Linux system administration and shell scripting
Nice to Have
- Experience with HPC environments and job schedulers (SLURM, AWS ParallelCluster, or similar)
- Familiarity with GPU compute infrastructure (provisioning, scheduling, cost management)
- Background working with scientific or data-intensive workloads
- Experience with container runtimes for HPC (Apptainer/Singularity, Pyxis/enroot)
- Knowledge of serverless and event-driven architectures (AWS Lambda, GCP Cloud Run, Modal)
- Familiarity with workflow orchestration platforms (Dagster, Airflow, Prefect)
- Prior work in a research-oriented or scientific computing environment
- Experience with cost optimization and FinOps practices in cloud environments
Compensation & Benefits
- Total compensation around $250,000 at the top of band
- $160,000–$200,000 cash compensation
- Meaningful early-stage equity
- Full health, dental, and vision insurance
- In-person five days a week in San Francisco, CA