Senior Data Engineer
Engineering and Automation Hub
We are seeking a talented and highly motivated individual with a strong technical engineering background and relevant experience to join Vegetable R&D; Engineering and Automation Hub as a Senior Data Engineer. The Engineering and Automation Hub carries out a mission to create data-driven end-to-end digital workflows that improve operational efficiency and support predictive breeding for the R&D; organization. The ideal candidate will have strong expertise in working with Big Data technologies like Google BigQuery and experience implementing ETL processes to manage data pipelines efficiently.
You will play a crucial role in developing and maintaining key projects in the Vegetable R&D; portfolio. This involves collaborating with a diverse group of global cross-functional scientists, engineers, developers, plant breeders, and IT teams.
YOUR TASKS AND RESPONSIBILITIES
1. Developing and troubleshooting SQL queries on Google BigQuery, as well as designing scalable ETL pipelines using technologies like Metaflow and Python.
2. Oversee database management by optimizing SQL queries, implementing effective indexing strategies, and conducting regular performance tuning to enhance overall efficiency and responsiveness.
3. Establish materialized views for performance optimization, and create data quality checks and validation procedures to ensure data integrity.
4. Monitor, troubleshoot, and document data pipeline issues, implementing error handling and recovery mechanisms as needed.
5. Develop comprehensive data archiving and retention policies that balance storage efficiency, compliance requirements, and cost-effectiveness, utilizing tiered storage solutions and automated lifecycle management techniques.
6. Write clean, maintainable code following team standards and industry best practices. This includes writing comprehensive unit tests, participating in code reviews, and engaging in agile scrum development practices.
WHO YOU ARE (Education/Experience)
1. Bachelor's degree in computer science, software engineering, or a related discipline plus minimum of 5 years of experience.
2. Proven experience building large-scale data pipelines for production applications.
3. History of working in Agile Scrum teams.
SKILLS (Technical & Soft)
1. Strong expertise working with Google BigQuery, with advanced skills in SQL optimization, database performance tuning, data modeling, and using materialized views for enhanced efficiency.
2. Proven experience building ETL pipelines using AWS data services (e.g., RDS, Lambda, Step Functions) and orchestration frameworks like Metaflow or Airflow.
3. Familiarity with data streaming patterns and technologies like Apache Kafka.
4. Proficient in writing unit tests using frameworks (e.g., Pytest).
5. Experience with version control systems (e.g., GitHub) and working with CI/CD pipelines (GitHub Actions).
6. Excellent verbal and written communication skills, with the ability to work independently and collaboratively.
7. Strong attention to detail and commitment to code quality.
#J-18808-Ljbffr