Please find the JD below:
The ideal candidate will be responsible for developing high-quality applications.
- Design, develop, maintain efficient and scalable solutions using PySpark
- Ensure data quality and integrity by implementing robust testing, validation and cleansing processes
- Integrate data from various sources, including databases, APIs, external datasets etc.
- Optimize and tune PySpark jobs for performance and reliability
- Document data engineering processes, workflows and best practices
- Strong understanding of databases, data modelling, and ETL tools and processes
- String programming skills in python and proficiency with PySpark, SQL
- Experience with relational databases, Hadoop, Spark, Hive, Impala
- Excellent communication and collaboration skills