AWS Government Cloud Solutions Operational Excellence team is looking for experienced large-scale operations data-diving system engineers with a knack for generating tooling and analytics to drive continuous process improvement.
Improving quality and velocity for our service teams is our primary mission. To get us there you know how to "talk the talk" and "walk the walk" building and operating large scale 24x7 systems. You won't be telling people how they should improve; you will be identifying and ranking operational hot spots, engineering mitigations, and driving tooling to increase everyone's operational scalability. You know that people alone cannot completely solve problems at large scale. To truly scale further than you can foresee, you need integrated tooling that is a part of everything we do. You know we need to measure operational problem areas the same way we measure success. You think big, build quickly, and continually iterate to get us there.
Systems Development Engineers within our team are instrumental in creating, automating, deploying, operating and scaling a massive always-on distributed system. We are seeking passionate engineers with strong systems engineering skills who proactively automate away problems and constantly look to improve quality of service. The ideal candidate will have thrived in operating complex systems, diagnosing and resolving the hardest corner case problems.
AWS is looking to hire highly motivated, best-in-class hands-on Systems Development Engineer "Data Diver" to join our Government Cloud Solutions Operations team.
As a system engineering data diver, you will:
- Data Dive operational support systems/databases and dig out operations data artifacts to help us answer Operational Risk, Effort, and Priority questions for our service teams and customers
- Define ways to measuring Risk and Effort and producing scorecards from operational data artifacts
- Devise, develop, and champion AWS operations best practices
- Audit and improve system metrics, alarms, and architectures to ensure high availability.
- Monitor service trends to identify opportunities for improvements within existing frameworks, tools and processes to continuously improve systems
- Dig out data artifacts from operational systems to implement operational metrics and key performance indicators (KPI) that measure service team quality and velocity
- Identify operational hot spots and drive operational excellence through automated tooling and operational score cards
- Be obsessed with our customers' needs and drive them back into actionable operational improvements to the AWS service teams
- Work with multiple service teams to plan, deploy, and support large scale AWS services and features
We have multiple positions that are perfect for someone who can think about the big picture and iterate operational improvements to get us there. If you like to learn and be curious, this is the position for you.