DevOps / Platform Engineer – Healthcare SaaS (Azure, Python/Django)
Role SummaryWe are seeking a senior, hands-on DevOps / Platform Engineer to own production readiness, reliability, security, multi-tenancy, and cost optimization for our Azure-based, AI-native healthcare platform. This role is critical to ensuring the company operates as a secure, compliant, highly available SaaS platform as we onboard regulated healthcare customers and scale multi-tenant workloads. You will act as the platform owner, working closely with engineering, security, and leadership.
Key ResponsibilitiesProduction Readiness & Reliability Own production readiness across development, staging, and production environments
Design and implement:
Safe deployment strategies (blue-green, rolling, canary) Automated rollbacks Health checks, monitoring, and alerting Operate and maintain high-availability, fault-tolerant systems Lead incident response, root-cause analysis (RCA), and preventive remediation Establish SLOs, SLIs, and operational runbooks Multi-Tenancy & Scalability Design and operate secure multi-tenant infrastructure for a healthcare SaaS platform Implement tenant isolation across:Compute, Network, Data Configuration and secrets Enable tenant-aware deployments and customer onboarding Ensure scalability without cross-tenant impact, data leakage, or performance degradation
Azure Cloud Infrastructure Design, build, and operate Azure-first infrastructure, including:
AKS and containerized microservices
App Services, Azure Functions, and background workers Azure SQL, Cosmos DB, Blob Storage VNETs, private endpoints, NSGs, firewalls, and ingress controls Manage infrastructure using Infrastructure as Code (Terraform preferred; Bicep/ARM acceptable) Ensure environments are reproducible, auditable, and secure by default
Azure Security & Identity Implement and manage Azure security and identity services:
Azure Entra ID (RBAC, managed identities, conditional access)
Microsoft Defender for Cloud
Azure Key Vault (secrets, keys, certificates) Enforce least-privilege access, strong authentication, and audit logging Support SOC 2 Type I & II and HIPAA-aligned security controls Partner with security and engineering teams on threat modeling and compliance readiness
CI/CD & Release Engineering Build and maintain CI/CD pipelines using GitHub Actions and/or Azure DevOps
Enable:
Zero-downtime deployments
Versioned APIs and backward compatibility Environment-specific configuration and secrets Improve release reliability, deployment speed, and developer productivity
Cost Optimization (FinOps) Monitor and optimize Azure cloud spend across environments and tenants
Implement:
Budgets, alerts, and cost attribution
Environment-level and tenant-level cost visibility Right-size compute, storage, and networking resources Partner with engineering and leadership on cost forecasting and optimization
AI & Data Platform EnablementSupport AI-native workloads, including Azure OpenAI–based services Operate document ingestion and event-driven pipelines (fax, PDFs, clinical data) Ensure secure handling of PHI and regulated healthcare data across pipelines Support scalable, resilient background processing and async workloads
Customer Onboarding & IntegrationsSupport production onboarding of new healthcare customers Enable repeatable, automated deployment and go-live processes Support integrations with payer platforms, EHRs, and external vendors Act as a technical escalation point during customer launches
QualificationsRequired 7+ years of experience in DevOps, SRE, or Platform Engineering Strong hands-on experience with Microsoft Azure Experience operating production SaaS platforms
Deep experience with:
Kubernetes / AKS
Docker and containerized workloads CI/CD pipelines Infrastructure as Code (Terraform, Bicep, or ARM) Experience designing and operating multi-tenant architectures Strong understanding of cloud security, identity, and access management Experience with regulated or compliance-driven environments (SOC 2, HIPAA, etc.) Experience with cloud cost optimization / FinOps Experience with Azure OpenAI or AI/ML platforms
Nice to HaveExperience supporting Python / Django production systems Prior ownership of production incident management and reliability metrics