Job Descritpion of Spark (Pyspark) Developer
Experience: 6 Years
- The developer must have sound knowledge in Apache Spark and Python programming.
- Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Experience in deployment and operationalizing the code is added advantage – Have knowledge and skills in Devops/version control and containerization. Preferable – having deployment knowledge.
- Create Spark jobs for data transformation and aggregation Produce unit tests for Spark transformations and helper methods
- Write Scaladoc-style documentation with all code
- Design data processing pipelines to perform batch and Real- time/stream analytics on structured and unstructured data
- Spark query tuning and performance optimization – Good understanding of different file formats (ORC, Parquet, AVRO) to optimize queries/processing and compression techniques.
- SQL database integration (Microsoft, Oracle, Postgres, and/or MySQL)
- Experience working with (HDFS, S3, Cassandra, and/or DynamoDB)
- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)
- Experience in building cloud scalable high-performance data lake solutions
- Hands on expertise in cloud services like AWS, and/or Microsoft Azure.
Required Skills for Spark (Pyspark) Developer Job
Our Hiring Process
- Screening (HR Round)
- Technical Round 1
- Technical Round 2
- Final HR Round