- Use Python, Apache Spark and/or Hadoop to design, develop and maintain scalable data processing pipelines.
- Implement complex data transformations and aggregations to support the analysis and reporting of large volumes of data.
- Optimise data access and processing performance in large organisations.
- Implement machine learning models and algorithms in collaboration with data science teams.
- Ensure data integrity and system stability through comprehensive testing and error handling.
- Lead the integration of big data technologies with existing systems and infrastructure.
- Be a mentor and guide to junior engineers and team members.
Requirements
- Knowledge of Python programming and big data frameworks such as Apache Spark or Hadoop.
- Experienced with workflow orchestration tools such as Dagster.
- Familiar with messaging systems, especially Apache Kafka.
- Understanding of caching solutions such as Redis.
- Excellent analytical and problem solving skills.
- Excellent communication and team working skills.
Benefits
- Work remotely (hybrid available)
- Join us as an employee or contractor
- Professional challenge and development opportunities, stable market environment