• Site Reliability Engineering Manager

    Location US-WA-Seattle
    Posted Date 4 days ago(8/14/2018 2:16 PM)
    Job ID
  • Job Description

    Amazon CloudFront is a web service for content delivery and one of the cornerstones of AWS. We bring the Internet closer to end users, speeding up the user experience. CloudFront integrates with other Amazon Web Services to give developers and businesses an easy way to distribute content (webpages, games, movies, live streams, etc.) to end users with low latency and high data transfer speeds. CloudFront uses a global network of edge locations to cache and route requests to the best possible location based on known latency, available capacity and many other factors. We build for massive scale and performance.

    Managers of Site Reliability Engineering (SRE) are ultimately responsible for the day-to-day health and uptime of the services our SRE team supports. ​ As CloudFront grows, we need to keep our systems reliable and constantly scaling. We’re looking for an ambitious self-starter with a passion for cloud computing and who can serve as a strong advocate for our customers. We believe passion and personality matter; as such, we need leaders that can manage diverse, smart, and driven engineers while balancing day to day people management with moving the business forward both technically and culturally. We are looking for strong technical candidates with proven analytical, problem-solving, and troubleshooting capabilities who can resolve production escalations and iterate on improving both production and pre-production environments.

    Your responsibilities include but are not limited to:
    • Creating and enhancing regression metrics and automated tests
    • Identifying and developing processes, tools, automation, and software changes to address top operational issues
    • Working in close collaboration with software development leadership and support operations technical leads to shape the future roadmap and establish strong operational readiness across teams
    • Leading change and motivating engineers to develop simple, elegant solutions to complex operational or reliability challenges
    • Utilizing hands-on technical skills to partner with team members and be comfortable diving into the fray as needed
    • Recruiting, hiring, and mentoring talented SREs

    Here are some reasons you should come work with us:
    • You will provide a service that is core to all modern Internet businesses.
    • You will be in charge of the complete software development lifecycle, defining, prioritizing, designing, building, and testing new features.
    • You will operate an AWS business with globally distributed servers and customers at a massive scale.
    • You will own one of the lowest latency and highest throughput services in all of AWS.
    • You will get the opportunity to work closely with a great team of software developers.
    • You will understand what it takes to grow and operate a global business at our scale.

    • Are a group of technologists from diverse backgrounds.
    • Obsess over our customer’s needs and experience.
    • Are owners. We love building new innovative technologies, and improving our existing ones.
    • Wear multiple hats. We enjoy the prototyping and tinkering stages, as well as the rigor of making solutions production-ready.
    • Are fast-growing, agile and collaborative.

    • Enjoy seeing the impact your work has on real customers.
    • Are comfortable in an agile environment and create order from ambiguity.
    • Build strong teams with others as passionate as you about this mission.
    • Take ownership and do what it takes to get the job done.
    • Want to create services at a massive scale used by millions of people.
    • Use data to make decisions and validate assumptions.
    • Learn from others and help grow those in your team to achieve their best.

    Basic Qualifications

    • 5+ years of software support, reliability, or operations engineering experience.
    • 5+ years of successful engineering management experience in a technical operations environment: managing a team of at least eight engineers.
    • Previous operational responsibility for business critical production systems
    • You have the ability to thrive in a high-pressured but highly customer-oriented environment.
    • Ability to contribute to multiple projects/demands simultaneously.
    • Experience establishing and evolving engineering development processes
    • Strong debugging, troubleshooting, and problem solving skills
    • Strong verbal and written communication skills
    • Computer Science fundamentals and technical architecture and design skills
    • Must be a self-starter and motivated

    Preferred Qualifications

    • Bachelors or advanced degree in Computer Science or closely related field
    • Deep knowledge of Internet protocols such as HTTP, DNS, TCP, and UDP
    • Thorough understanding of web services technologies such as SOAP, HTTP, WSDL, XSD, and REST
    • Prior systems development engineering experience
    • Familiarity with Linux development environment
    • Familiarity with kernel engineering
    • Experience with DDoS mitigation and network engineering
    • Strong OO programming skills, preferably in Java, C++ or C
    • Experience in TDD techniques & Continuous Integration
    • Work experience delivering one or more V1 products, ideally in a startup or similar setting
    • Experience building large-scale web services backed by cloud services, such as AWS
    • Experience operating large scale systems
    Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
    Share this job