• Big Data Engineer, Catalog DataWorks

    Location US-CA-Cupertino
    Posted Date 4 months ago(12/6/2018 1:06 PM)
    Job ID
    756438
    Company/Location (search) : Country (Full Name)
    United States
  • Job Description


    Amazon's Catalog DataWorks team is looking for highly motivated data engineers. We are embarking on multiple new initiatives to re-organize Amazon's catalog of billions of products, in new and interesting views, that drive several features Amazon's customers love. Today, these views drive hundreds of popular features like product recommendations, clustering of similar products, and shopping with Alexa. We will build a new near real-time Catalog Data Lake on AWS, to enable engineers and scientists across Amazon to solve customer problems faster. Come join us on this exciting journey!


    As a data engineer on this team, you will own the Catalog Data Lake end-to-end. You will work closely with business partners to synthesize technical requirements.Your focus is at the team level on a major portion of existing or new data architecture (e.g., large or significant dataset, mid-size data solutions). You create coherent Logical Data Models that drive physical design.Your responsibilities may range from optimizing operational data storage to processing semi-structured data streams to building self-service business intelligence infrastructure for analysts. You take on projects and make enhancements that improve data processes (e.g., data auditing solutions, management of manually maintained tables, automating, ad-hoc or manual operation steps). You will use industry technologies like Spark, MapReduce, NoSQL, Parquet as well as modern AWS offerings like EMR, Glue, Athena, and Redshift. We are fortunate to be at the cusp of innovation in both the e-commerce business as well as cloud technology. As a key stakeholder, you will constantly develop ETLs, Queries, new patterns, algorithms, models for ranking, anomaly, pricing, etc.

    Basic Qualifications

    • Bachelor's degree or higher in computer science or math is required.

    • Data engineering Knowledge including ETL, Machine learning Models, NoSQL, Hive/SparkSQL/Athena, NoSQL
    • Strong computer science fundamentals - algorithms, data structures , design patterns and programming (Java , Scala / Python)

    • At least 5 years of software development experience.

    • At least 3 years of experience of using Big Data systems.

    Preferred Qualifications

    • Experience in building/working on large scale distributed systems like Hadoop, Spark in the cloud.

    • Experience with industry standards big data technologies (Spark, Kafka, Hive, Presto, NoSQL, or AWS equivalent)

    • Data modeling experience with columnar data formats (Parquet, ORC etc.)
    • ETL patterns

    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share this job