6 to 8 Years of Relevant Experience
We are seeking a highly skilled and motivated Data Engineer with 6–8 years of experience in building scalable data pipelines and implementing robust data engineering solutions. This role involves working with modern data tools and frameworks such as Apache Airflow, Python, and PySpark to support the reliable delivery, transformation, and integration of data.
Key Responsibilities
- Design, develop, and maintain data pipelines and ELT processes using Apache Airflow, Python, and PySpark to ensure efficient and reliable data delivery.
- Build custom data connectors to ingest structured and unstructured data from diverse sources and formats.
- Collaborate with cross-functional teams to gather requirements and translate business needs into scalable technical solutions.
- Implement DataOps principles and best practices for efficient and resilient data operations.
- Design and deploy CI/CD pipelines for automated data integration, transformation, and deployment.
- Monitor and troubleshoot data workflows to proactively resolve ingestion, transformation, and loading issues.
- Perform data validation and testing to ensure accuracy, consistency, and compliance.
- Stay current with emerging trends and best practices in data engineering and analytics.
- Maintain comprehensive documentation of data workflows, pipelines, and technical specifications to support governance and knowledge sharing.
Qualifications
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- 5+ years of experience in data engineering, ELT development, and data modeling.
- Strong proficiency in Apache Airflow and Apache Spark for data transformation and orchestration.
- Experience with tools such as SSIS, or similar workflow orchestration platforms.
- Proven expertise in developing custom connectors for ingesting data from varied sources.
- Deep understanding of SQL and relational databases, with a focus on performance tuning and optimization.
- Hands-on experience with CI/CD for data pipelines and implementation of DataOps practices.
- Familiarity with data governance practices and frameworks.
- Knowledge of distributed systems and handling large-scale datasets.
- Experience with real-time data streaming technologies like Apache Kafka.
- Strong knowledge of software development practices, including version control (e.g., Git) and code review processes.
- Experience working in Agile/Scrum environments and collaborating with cross-functional teams.
- Familiarity with data visualization tools like Apache Superset and dashboard development.
- Excellent problem-solving skills and attention to detail.
- Strong communication skills, capable of translating complex technical concepts for non-technical stakeholders.
- Ability to work effectively in a fast-paced, dynamic environment with changing priorities.
Required Skills
- Python
- PySpark
- SQL
- Apache Airflow
- Trino
- Hive
- Snowflake
- Agile/Scrum methodologies
Nice to Have
- Linux
- OpenShift
- Kubernetes
- Apache Superset