Descripción del trabajo
To facilitate access to different languages and provide more information on our site, we have opted for automatic translation.
Despite our vigilance, please bear with the sometimes literal translation of contents.
Location: [Insert Location]
Language: [Insert Language]
Contract Type: Open-ended contract
Mission:
* Data modeling and pipelines development with Spark on Scala in order to ingest and transform data from several sources (Kafka topics, APIs, HDFS, structured databases…).
* Data transformation and quality to ensure data consistency and accuracy.
* Set up CI / CD pipelines to automate deployments, unit testing and development management.
* The implementation of different orchestrators and scheduling processes to automate the data pipeline execution (Airflow as a service).
* Modifying the existing code as per business requirements and continuously improving to achieve better performance and maintainability.
* Ensuring the performance and security of the data infrastructure and following the best practices of Data engineering.
* Contributing to production support, incident and anomaly corrections, and implementing functional and technical evolutions to ensure the stability of production processes.
* Writing technical documentation to ensure knowledge capitalization.
Profile:
Good knowledge of:
* Spark on Scala
* HDFS and structured databases (SQL)
Full understanding of:
* S3 storage
* Shell script
Some knowledge of:
Optionally / as a plus:
* Elasticsearch and Kibana
* HVault
* Dremio as a tool to virtualize data
J-18808-Ljbffr
#J-18808-Ljbffr