Title: Senior Data Engineer Data Quality, Ingestion & API Development
Notice period - Immediate joiners
Location - Remote
Job Overview
We are seeking an experienced Senior Data Engineer to lead the development of a scalable data ingestion framework while ensuring high data quality and validation. The successful candidate will also be responsible for designing and implementing robust APIs for seamless data integration. This role is ideal for someone with deep expertise in building and managing big data pipelines using modern AWS-based technologies, and who is passionate about driving quality and efficiency in data processing systems.
Key Responsibilities
- Data Ingestion Framework:
- Design & Development: Architect, develop, and maintain an end-to-end data ingestion framework that efficiently extracts, transforms, and loads data from diverse sources.
- Framework Optimisation: Use AWS services such as AWS Glue, Lambda, EMR, ECS , EC2 and Step Functions to build highly scalable, resilient, and automated data pipelines.
Data Quality & Validation:
- Validation Processes: Develop and implement automated data quality checks,validation routines, and error-handling mechanisms to ensure the accuracy and integrity of incoming data.
- Monitoring & Reporting: Establish comprehensive monitoring, logging, and alerting systems to proactively identify and resolve data quality issues.
API Development:
- Design & Implementation: Architect and develop secure, high-performance APIs to enable seamless integration of data services with external applications and internal systems.
- Documentation & Best Practices: Create thorough API documentation and establish standards for API security, versioning, and performance optimization.
Collaboration & Agile Practices:
- Cross-Functional Communication: Work closely with business stakeholders, data scientists, and operations teams to understand requirements and translate them into technical solutions.
- Agile Development: Participate in sprint planning, code reviews, and agile ceremonies, while contributing to continuous improvement initiatives and CI/CD pipeline development (using tools like GitLab)
Required Qualifications
- Experience & Technical Skills:
- Professional Background: At least 5 years of relevant experience in data engineering with a strong emphasis on analytical platform development.
- Programming Skills: Proficiency in Python and/or PySpark, SQL for developing ETL processes and handling large-scale data manipulation.
- AWS Expertise: Extensive experience using AWS services including AWS Glue, Lambda, Step Functions, and S3 to build and manage data ingestion frameworks.
- Data Platforms: Familiarity with big data systems (e.g., AWS EMR, Apache Spark, Apache Iceberg) and databases like DynamoDB, Aurora, Postgres, or Redshift.
- API Development: Proven experience in designing and implementing RESTful APIs and integrating them with external and internal systems.
- CI/CD & Agile: Hands-on experience with CI/CD pipelines (preferably with GitLab) and Agile development methodologies.
Soft Skills:
- Strong problem-solving abilities and attention to detail.
- Excellent communication and interpersonal skills with the ability to work independently and collaboratively.
- Capacity to quickly learn and adapt to new technologies and evolving business requirements.
Preferred Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.
- Experience with additional AWS services such as Kinesis, Firehose, and SQS.
- Familiarity with data lakehouse architectures and modern data quality frameworks.
- Prior experience in a role that required proactive data quality management and API- driven integrations in complex, multi-cluster environments.