Job Title: ML Engineer (Speech-to-Speech) — Subject Matter Expert

Level: Senior SME

Department: Software Development

Status: Contract (10-15 hours/week)

Work location: Fully Remote

Compensation: Hourly ($250)

Company Overview: At Vosyn, we embrace the exciting, game-changing world of Artificial Intelligence, driving innovation and pioneering impactful projects across various industries. Our incubator, AI Venture Lab, nestled in the heart of Office146.com, is a crucible of entrepreneurial spirit, supported by intelligent processes and industry-leading best practices. We believe in fostering a culture of flexibility, continuous improvement, and solution-focused strategies. Here, every idea is welcomed, nurtured, and has the potential to scale to new heights. Currently, we're at the forefront of a significant IPO endeavor, a truly unicorn in the making. We invite you to be part of our journey and leave your imprint on the future of AI. At Vosyn, you will have the opportunity to engage with a fast-growing global organization with diversity of thought, experience, and cultures.

About the Role: We are seeking an experienced ML Engineer SME to provide strategic guidance and technical leadership on key components of our end-to-end speech-to-speech (S2S) pipeline. As a senior project advisor, you will collaborate with the VosynCore team, identifying solutions to complex challenges, particularly in text-to-speech (TTS) model development and optimization. Your expertise will be crucial in driving project progress and ensuring our S2S pipeline meets or exceeds industry standards for quality and performance.

Key Responsibilities:

Provide expert-level advice and mentorship on the architecture, training, and production of text-to-speech (TTS) models
Guide the implementation of robust testing methodologies for TTS models using industry standards like MOS testing
Share expertise in distributed training, monitoring, and deployment of large-scale ML models on cloud platforms
Lead latency optimization initiatives in real-time systems for high-quality speech-to-speech conversion
Provide guidance on tuning TTS models for precise control over speech characteristics
Share in-depth knowledge of various TTS model architectures and waveform generation methods
Mentor the team on implementing advanced deep learning models for audio processing
Guide the development of transformer architectures for complex TTS model development

Required Qualifications:

5+ years of proven experience in machine learning development focused on audio generation and TTS systems
Extensive expertise in audio signal processing, particularly for human voices
Deep experience with TTS models, including waveform generation and spectrogram-based methods
Proven expertise in tuning TTS models for duration control and speech characteristics
Strong proficiency in Python and machine learning frameworks such as PyTorch
Experience with advanced deep learning models like WaveNet and transformer-based architectures
Demonstrated experience in distributed training and deployment of ML models on cloud platforms
Strong understanding of evaluation metrics for TTS systems
Proven ability to provide technical leadership and actionable guidance
Excellent communication and mentoring skills

Preferred Qualifications:

Experience with real-time audio processing systems
Background in speech synthesis research
Knowledge of multiple languages and accents in TTS
Experience with ML model optimization techniques
Publications or patents in related fields

Additional Perks:

Be part of the invigorating journey of a pre-seed AI startup in stealth mode
Engage directly with senior management and strategic advisory board members
Gain valuable experience in the bleeding-edge AI space
Remote-first culture with flexible working arrangements DEI and Workplace Safety: At Vosyn Inc., we are committed to fostering a diverse, equitable, and inclusive workplace where every employee feels valued and supported. We believe that diversity of thought, background, and experience enriches our company culture and enhances innovation. We are an equal-opportunity employer and encourage candidates from all walks of life to apply. As part of our commitment to creating a safe and healthy work environment, we prioritize workplace safety, adhering to all relevant regulations and promoting a culture of responsibility. We believe that a safe and inclusive workplace is essential for the well-being and success of our team members.

Recruitment Process:

Initial Screening: Review of application and preliminary assessment
Video Interview: In-depth discussion of ML expertise and role expectations
Technical Panel Interview: Deep dive into audio processing experience and ML architecture approach
Final Selection: Assessment of overall fit and alignment
Offer and Onboarding: Equity compensation details and onboarding process. Join a dynamic global organization that champions diversity in thought, experience, and culture. Our team is composed of top experts from around the world. We invite you to leverage your expertise, mentor future leaders, and thrive with us in this exciting journey.

Apply Now: Vosyn Careers

Salary: $250.00 / hrOriginally posted on Himalayas

ML Engineer (Speech-to-Speech) — Subject Matter Expert

About this role

About Vosyn

Related Jobs