Weights & Biases logo

Site Reliability Engineer - US

Remote position, United States only


Looking for a role where you can shape the infrastructure of data-intensive, real-time services? Join the Site Reliability teams at Weights Biases! As a key player in ensuring our high-volume, low-latency environments perform flawlessly, you'll collaborate with product engineers to manage millions of requests, guaranteeing our customers always have reliable data. With your expertise in managing distributed systems, cloud providers, and containerized deployment systems, you'll help us scale and support our rapidly growing user base. Plus, enjoy flexible time off, competitive salary and equity, and a remote-first culture with in-office flexibility in San Francisco. Apply now and bring your diverse and creative perspective to our inclusive team!

Job Description

At Weights Biases, our mission is to build the best developer tools for machine learning. Weights Biases is a series C company with $250 million in funding and a rapidly growing user base. Our platform is an essential piece of the daily work for machine learning engineers, from academic research institutions like FAIR and UC Berkeley to massive enterprise teams including iRobot, OpenAI, Toyota Research Institute, Samsung, NVIDIA, Salesforce, Blue Cross Blue Shield, Lyft, and more.

The Site Reliability teams at Weights Biases are responsible for ensuring that our high-volume, low-latency environments continue to perform around the clock. These teams collaborate closely with our product engineers to ensure that Weights Biases can manage millions of requests, ensuring our customers always have dependable and actionable data at their fingertips. Youโ€™ll be responsible for shaping the infrastructure of our data-intensive, real-time services as we continue to grow at petabyte scale.


  • Keep our services reliable, available, fast and cost-efficient
  • Respond to, investigate and fix service issues, whether they are deep in the OS kernel or in the application code
  • Build tools and production frameworks to make our engineering teamโ€™s lives easier
  • Design, build and maintain the infrastructure we need to support orders of magnitude more customers


  • Experience managing, monitoring, and debugging large-scale distributed systems in production
  • In-depth knowledge of at least one cloud provider (AWS, GCP, Azure, VMWare, etc)
  • Strong grasp of at least one higher-level language and its ecosystem (Go, Python, TypeScript, etc.)
  • Deep understanding of IaC concepts and tools (Terraform, SaltStack, Ansible, etc)
  • Experience with containerized deployment systems such as Kubernetes
  • Familiarity with CI/CD tools (Jenkins, GitHub Actions, FluxCD, Argo, etc)
  • Experience with monitoring and scaling production SQL databases (MySQL / PostgreSQL preferred)

Our Benefits

  • ๐Ÿ๏ธ Flexible time off
  • ๐Ÿฉบ Medical, Dental, and Vision for employees and Family Coverage
  • ๐Ÿ  Remote first culture with in-office flexibility in San Francisco
  • ๐Ÿ’ต Home office budget with a new high-powered laptop
  • ๐Ÿฅ‡ Truly competitive salary and equity
  • ๐Ÿšผ 12 weeks of Parental leave (U.S. specific)
  • ๐Ÿ“ˆ 401(k) (U.S. specific)
  • Supplemental benefits may be available depending on your location
  • Explore benefits by country

We encourage you to apply even if your experience doesn't perfectly align with the job description as we seek out diverse and creative perspectives. Team members who love to learn and collaborate in an inclusive environment will flourish with us. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you need additional accommodations to feel comfortable during your interview process, reach out at [email protected].

Apply for this job

Report expired

Please let Weights & Biases know you found this job with RemoteJobs.org. This helps us grow!

About Weights & Biases

Learn about Weights & Biases and their company culture.

View company profile