Company: System Staffing Group
Job description: Job Description
The Data Engineering team has a need for a Big Data engineering professional who can assist with important test automation and development tasks associated with mission critical projects.
- Expected start is as soon as feasible.
- Expected duration of this role is 12 months, with the possibility of extension for up to an additional 12 months. Initial ramp up time is expected to be about two months (conservative estimate based on current 100% remote work situation).
- This engineer will work closely with the team’s primary test engineer and the broader team to align cohesively with ongoing projects.
Python Developer will perform the following tasks:
- Primary focus will be on QA test automation.
- Will define a test automation strategy for new data pipelines or modify existing strategy to expand QA coverage of an existing pipeline.
- Create data pipeline, specific input datasets, and expected datasets to implement QA automation.
- Modify existing input datasets and expected datasets when business requirements for existing data pipelines change.
- Create or modify Python scripts for triggering data pipeline specific QA tests and validating against the expected outputs.
- Integrate with Jenkins CI/CD automation to run nightly QA tests automatically.
- Perform manual testing where automation is not feasible, or QA tests need to be run on ad-hoc basis.
The role requires an engineer who is data savvy and has an overall system quality mindset.
- Advanced Python scripting skills are must. For example, ability to work with ease using various data formats such as CSV, JSON, XML in Python and create modular/re-usable code. Should be able to write OO code and understand list comprehensions in Python.
- Comfortable working in Linux environment with bash, grep, awk, ssh, xargs etc.
- Critically think about corner cases in data pipelines and create test cases to simulate those conditions. Should understand the significance of test coverage. Should understand fault injection.
- Understands the problems associated with processing large datasets (10’s of TB) and is conceptually familiar with technologies available to solve those problems.
- Not expected to know Hadoop or Spark but would-be a plus.
ADDITIONAL INFORMATION FROM MANAGER:
1. To get started, can you give me a quick overview of your team and what you do?
Team is responsible for development and deployment of data pipelines that helps create enterprise unified data analytics and storage platform.
2. The person who will be successful in this role will be somebody who?
The person needs to be curious about data in general and must have a mindset of delivering high quality data centric solutions
3. What would you say are the TOP 3 must-have skills you’re looking for? (Measurable skills, technologies, etc.)
b) Data Manipulation
Location: San Diego, CA
Job date: Sat, 21 Aug 2021 05:12:36 GMT
Apply for the job now!