Data Engineer - SQL, Spark, Python, Scala

Mavensoft Technologies · Portland, OR

Company

Mavensoft Technologies

Location

Portland, OR

Type

Full Time

Job Description

Job title: Data Engineer - SQL, Spark, Python, Scala (Remote)

Duration:4 months (contract)

Key Skills: Big Data/Hadoop, Delta Lakes, Python, Scala, SQL, Cloudera, Apache Hadoop, Hortonworks, mongoDB, Java, Apache Cassandra, Apache Hive, Hadoop Distributed File System (HDFS), Cloudera Impala, Apache Kafka, NoSQL Database, MapReduce, etc.

Role responsibilities:

  • Design and build reusable components, frameworks and libraries at scale to support analytics products.
  • Design and implement product features in collaboration with business and Technology stakeholders
  • Identify and solve issues concerning data management to improve data quality Clean, prepare and optimize data for ingestion and consumption
  • Collaborate on the implementation of new data management projects and re-structure of the current data architecture
  • Implement automated workflows and routines using workflow scheduling tools
  • Build continuous integration, test-driven development and production deployment frameworks
  • Collaboratively review design, code, test plans and dataset implementation performed by other data engineers in support of maintaining data engineering standards
  • Analyze and profile data for designing scalable solutions
  • Troubleshoot data issues and perform root cause analysis to proactively resolve product and operational issues
  • Develop architecture and design patterns to process and store high volume data sets
  • Participate in an Agile / Scrum methodology to deliver high - quality software releases every 2 weeks through Sprints
  • Includes experience with Cloudera, Apache Hadoop, Hortonworks, mongoDB, Java, Apache Cassandra, Apache Hive, Hadoop Distributed File System (HDFS), Cloudera Impala, Apache Kafka, NoSQL Database, MapReduce, etc

The following qualifications and technical skills will position you well for this role:

  • 5+ years of experience with detailed knowledge of data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools.
  • 3+ years' experience in Big Data stack environments (Hadoop, SPARK, Hive & Data Lake)
  • 3+ year's working with multiple file formats (Parque, Avro, Delta Lake) & API
  • 3+ year's expericen in cloud environments like AWS (Serverless technologies like AWS Lambda, API Gateway, NoSQL like Dynamo, EMR & S3)
  • Experience with relational and non relational SQL
  • Strong experience in coding languages like Python, Scala & Java
  • Have experience in building realtime streaming data pipelines
  • Experiece in pub/sub modes like Kafka
  • Strong understanding of data structures and algorithms
  • Experience in building lamda, kappa, microservice and batch architecture
  • Experience working on CI/CD processes and source control tools such as GitHub and related dev processes.
  • Has a passion for data solutions and willing to pick up new programming languages, technologies, and frameworks

These are the characteristics that we strive for in our own work. We would love to hear from candidates who embody the same:

  • Desire to work collaboratively with your teammates to come up with the best solution to a problem
  • Demonstrated experience and ability to deliver results on multiple projects in a fast-paced, agile environment
  • Excellent problem-solving and interpersonal communication skills
  • Strong desire to learn and share knowledge with others

Date Posted

02/04/2023

Views

11

Back to Job Listings Add To Job List Company Profile View Company Reviews
Positive
Subjectivity Score: 0.8