About Working at Envelio
Too easy is boring! Together, we’re on a mission to drive the energy transition forward. We love what we do, and no challenge is too big for us. We take ownership of our work and grow with every new task. In short: Own it, love it, grow with it.
We’re a down-to-earth team of coffee and mate lovers. Our geeky sense of humor leads to a ritualistic use of emojis and the encyclopedic accumulation of useless trivia. More than 150 Envelians from over 20 different countries are already on board. Join us and grow with us!
Your Role
As Team Lead Platform Operations (all genders), you’ll build and lead a highly technical team of about 6 people, focusing on the stable, secure, and predictable operation of our product: the Intelligent Grid Platform (IGP).
Your team is responsible for Product Operations: You ensure that customer IGP environments run reliably, drive operational processes such as incident handling and releases, and derive systematic improvements from real production signals.
You will work closely with Product, Customer Success, and Engineering teams. You will also collaborate closely with the SRE/Infrastructure team, which is responsible for the platform’s foundation (cluster provisioning, deployment pipelines, observability tools, etc.), while your team focuses on the day-to-day operational management of the IGP for customers.
You help to gradually evolve our operating model toward 24/7 reliability for customer environments (processes, ownership, and escalation)—together with Engineering, SRE/Infrastructure, and Customer Success.
How you make a difference
You coach and mentor your team members and help them develop through one-on-one meetings and regular feedback
You are responsible for and develop the operational execution of IGP operations across customer environments
You ensure rapid and structured handling of customer-related issues (e.g., IGP Incidents / HOTs) and ensure sustainable follow-ups
You clarify ownership and escalation paths for production issues and coordinate efficiently across squads with Customer Success
You drive operational excellence: calm incident communication, pragmatic problem-solving, and a culture of continuous improvement (blameless)
You balance short-term operational work (restoring service) with long-term investments (reducing toil, improving reliability, enhancing tooling and runbooks)
You set priorities, plan capacity, and manage the roadmap/backlog for operations-related work
You shape the team through the recruitment process and design individual development paths
Your Profile
Perfection is a myth! We’re much more interested in the person behind the screen. So these criteria are meant more as a guide for you. We’re excited to see how your individual skills fit with us.
You have extensive experience operating complex cloud applications and know how to reliably run services under real-world constraints - You operate production services on cloud infrastructure (AWS/Azure/GCP) and are familiar with typical failure modes
You have hands-on experience with Linux and networking basics in troubleshooting (logs, system status, connectivity)
You are familiar with modern operational models such as containers/Kubernetes (or comparable) and can evaluate deployments in production (rollouts, rollbacks)
You are confident in incident management, root cause analysis, and prioritization under time pressure
You have proven experience leading and developing a team in an operations-focused environment
You are skilled at managing stakeholders and coordinating across teams (engineering squads, product, customer success)
You sustainably reduce operational effort through improved processes, automation, and documentation
You communicate clearly, especially in high-pressure situations, and ensure alignment on next steps
You are fluent in both spoken and written German and English.
How We Develop Software
Clearly defined responsibilities for product topics and efficient coordination between squads and Customer Success
Structured incident management (restore service, communicate clearly, then analyze root causes)
Release processes with pragmatic risk management (safe changes, fast rollbacks when needed)
Monitoring and alerting hygiene (signal over noise)
Comprehensive runbooks and automation to reduce operational burden in the long term
Our tech stack
Multi-cloud, hybrid on-prem setup with Kubernetes and Helm as standard
Applications primarily in Python and TypeScript
Standard backend services such as PostgreSQL, RabbitMQ, Redis
GitLab & GitLab CI
Terraform for Infrastructure as Code
Your benefits
Tailor your work mode to your lifestyle – fully remote (or hybrid with an office option)
Option to work remotely from abroad (up to three months per year from anywhere in the EU or the US)
State-of-the-art technology and a modern tech stack
Top-notch hardware (16-inch MacBooks, 2 monitors at your workstation)
30 vacation days + 3 corporate holidays
Support for your health through our partnership with Urban Sports Club
Flexible use of a monthly mobility budget (e.g., Jobrad, public transit)
Time and budget for personal growth
Optional company pension plan
Regular company and team events