Job Descritpion of AWS ETL Developer
10+ Years Relevant Experience
Must Have Skills
- Proficiency in programming languages such as Python, Scala, or similar
- Solid understanding of machine learning frameworks such as TensorFlow and PyTorch
- Strong experience in data classification, including the identification of PII data entities
- Knowledge and experience with retrieval-augmented generation (RAG) and agent-based workflows
- Deep understanding of how to re-rank and improve LLM outputs using Index and Vector stores
- Ability to leverage AWS services (e.g., SageMaker, Comprehend, Entity Resolution) to solve complex data and AI-related challenges
- Ability to manage and deploy machine learning models and frameworks at scale using AWS infrastructure
- Strong analytical and problem-solving skills, with the ability to innovate and develop new approaches to data engineering and AI/ML
- Experience with AWS ETL services such as AWS Glue, Lambda, and Data Pipeline for data processing and integration
- Experience in core AWS Services: AWS IAM, VPC, EC2, S3, RDS, Lambda, CloudWatch, CloudTrail
- Design, develop, and implement scalable and high-performance database solutions using Amazon DocumentDB
- Expertise in data modelling with NoSQL Databases like DocumentDB
- Manage and optimize DocumentDB clusters for reliability, scalability, and performance
- Develop and maintain queries, schemas, and indices for efficient DocumentDB data retrieval
- Seamlessly integrate DocumentDB with AWS services and applications
- Create and manage backups, disaster recovery plans, and data security protocols for DocumentDB
- Troubleshoot DocumentDB performance issues and resolve root causes
Additional Core Experience
- 10+ years of experience in developing Data Lakes with data ingestion from various sources (relational databases, flat files, APIs, streaming)
- Proficiency in Python, PySpark, Spark for efficient data processing
- Architect and implement robust ETL pipelines using AWS Glue (defining extraction, transformation, and loading procedures)
- Expertise in AWS Services: IAM, VPC, EC2, S3, RDS, Lambda, CloudWatch, CloudFormation, CloudTrail
- Design and develop event-driven data pipelines using AWS Glue
- Orchestrate and schedule jobs using Airflow
- Develop Event-Driven Distributed Systems using Serverless Architecture
- Experience with CI/CD pipelines (GitHub Actions, Jenkins)
- 10+ years in AWS Cloud Data Ingestion Patterns and Practices using S3 as a storage backbone
- 10+ years of experience using IaC tools like Terraform
- Experience in data migration from on-prem to AWS Cloud using AWS DMS
Nice to Have Skills
- Data modelling experience with NoSQL Databases like DocumentDB
- Familiarity with LLMs for data classification and PII identification
- Experience in developing data audit, compliance, and retention standards for governance
- Automation of data governance processes
- Experience with column-oriented data file formats like Apache Parquet and table formats like Apache Iceberg
- Knowledge of AWS AI Services (AWS Entity Resolution, AWS Comprehend)
Project Overview
- One of the workstreams of Project Acuity
- PASD Data Platform is a centralized web application for internal PASD users across the Recruitment Business
- Supports both marketing and operational use cases
- Building a patient-level database to enhance reporting capabilities and stakeholder engagement
Role Scope / Deliverables
- Role: AWS ETL Developer on the AWS Cloud team
- Responsible for automating provisioning of AWS infrastructure for the Data Platform
- Develop scalable Data Pipelines for data migration to AWS Cloud
- Manage the data lifecycle in alignment with business requirements
- Leverage over 10 years of IT experience in data engineering and AWS-based solutions
Required Skills for AWS ETL Developer Job
- AWS
- Python
- PySpark
- Spark
- CI/CD pipelines (GitHub Actions, Jenkins) Data Lakes
Our Hiring Process
- Screening (HR Round)
- Technical Round 1
- Technical Round 2
- Final HR Round