RemoteJobs.org mascotRemoteJobs.org
Remote JobsCompaniesAPIPost a Job
RemoteJobs.org mascotRemoteJobs.org

Find your dream remote job. Browse thousands of remote positions from top companies worldwide.

Job Categories

  • General
  • Programming
  • Design
  • Marketing
  • Sales
  • Customer Support

Resources

  • Browse Jobs
  • Companies
  • Post a Job
  • For Developers

Company

  • About Us
  • Contact
  • Privacy Policy
  • Terms of Service
© 2026 RemoteJobs.org. All rights reserved.
    ← Back to all jobs
    TechBiz Global

    Senior AI DevOps / LLMOps

    TechBiz Global
    Full-time
    Verified Remote
    RemoteDevOpsToday

    About this role

    At TechBiz Global, we are providing recruitment service to our TOP clients from our portfolio. We are currently seeking an Senior AI DevOps / LLMOpsspecialist to join one of our clients' teams. If you're looking for an exciting opportunity to grow in a innovative environment, this could be the perfect fit for you.

    Key Responsibilities

    • Automation of Build-to-Production

    • Design and implement robust CI/CD pipelines tailored for AI, covering model weights,

    dataset versioning, and application code.

    • Develop specialized workflows for PromptOps, ensuring that system prompts are

    version-controlled, tested for regressions, and deployed with the same rigor as traditional

    code.

    • Automate the deployment of Agentic workflows, managing the complexities of stateful

    AI interactions and multi-agent handoffs.

    1. AI Infrastructure as Code (IaC)
    • Provision and manage high-performance compute environments (GPU clusters, TPU

    pods) using Terraform, Pulumi, or Ansible.

    • Define and enforce Policy-as-Code for AI endpoints to ensure compliance with security,

    cost-usage limits, and data residency requirements.

    • Maintain a consistent environment across Hybrid Infrastructure, ensuring seamless

    parity between On-Premises development and Cloud production.

    1. Safe Experimentation & Controlled Releases
    • Architect Progressive Delivery strategies for AI, including Canary releases, Blue-Green

    deployments, and Shadowing (where new models run in parallel with production to

    compare outputs).

    • Build “Evaluation-in-the-Loop” gates within the pipeline to automatically test for bias,

    hallucination, and performance degradation before a release.

    • Implement A/B testing frameworks specifically designed for LLM outputs and agentic

    behavior.

    1. Monitoring & Observability
    • Establish deep observability into Inference Endpoints, tracking metrics like tokens-per-

    second, latency, and drift in model accuracy.

    • Integrate feedback loops that capture production “edge cases” to feed back into the

    training and fine-tuning pipelines.

    Requirements

    Must-Have Technical Skills:

    • Orchestration: Advanced Kubernetes (K8s) skills, specifically with KubeFlow, Ray, or

    NVIDIA Triton.

    • CI/CD & IaC: Expertise in GitHub Actions/GitLab CI, and Terraform or Pulumi.

    • AI Tooling: Experience with Weights & Biases, MLflow, LangSmith, or Arize

    Phoenix.

    • Hardware: Understanding of GPU virtualization, CUDA drivers, and on-premises

    hardware management.

    • Security: Familiarity with Open Policy Agent (OPA) and secret management (Vault).

    Experience:

    • 10+ years in DevOps, SRE, or Cloud Engineering.

    • 2+ years of hands-on experience in MLOps or LLMOps, specifically moving LLMs

    from notebook to production.

    • Proven experience managing Hybrid Cloud environments (e.g., AWS/Azure + Private

    Data Center).

    Highlights

    • full time and remote job

    • fluent English is needed

    About TechBiz Global

    TechBiz Global
    TechBiz Global

    Related Jobs

    Sr. Cloud Platform Engineer

    Applied Systems · USD 100,000 - 160,000

    Technical Product Owner - Cloud Infrastructure

    ClearlyAgile

    Cloud Engineer Principal

    General Dynamics Information Technology · USD 128,039 - 173,229