Amazon's Catalog DataWorks team is looking for highly motivated data engineers. We are embarking on multiple new initiatives to re-organize Amazon's catalog of billions of products, in new and interesting views, that drive several features Amazon's customers love. Today, these views drive hundreds of popular features like product recommendations, clustering of similar products, and shopping with Alexa. We will build a new near real-time Catalog Data Lake on AWS, to enable engineers and scientists across Amazon to solve customer problems faster. Come join us on this exciting journey!
As a data engineer on this team, you will own the Catalog Data Lake end-to-end. You will work closely with business partners to synthesize technical requirements.Your focus is at the team level on a major portion of existing or new data architecture (e.g., large or significant dataset, mid-size data solutions). You create coherent Logical Data Models that drive physical design.Your responsibilities may range from optimizing operational data storage to processing semi-structured data streams to building self-service business intelligence infrastructure for analysts. You take on projects and make enhancements that improve data processes (e.g., data auditing solutions, management of manually maintained tables, automating, ad-hoc or manual operation steps). You will use industry technologies like Spark, MapReduce, NoSQL, Parquet as well as modern AWS offerings like EMR, Glue, Athena, and Redshift. We are fortunate to be at the cusp of innovation in both the e-commerce business as well as cloud technology. As a key stakeholder, you will constantly develop ETLs, Queries, new patterns, algorithms, models for ranking, anomaly, pricing, etc.