Do you have a passion to innovate for a planet scale service that is core to all modern internet businesses and that too in a company that operates like a group of startups? This is an excellent opportunity to join one of Amazon’s world-class team of site reliability engineers (SRE), work with some of the best and brightest while also developing your skills and career within one of the most dynamic, innovative and progressive technology companies anywhere.
Amazon CloudFront is one of the fastest growing, lowest latency and highest throughput services in all of AWS. CloudFront is a fast content delivery network (CDN) service that securely delivers data, videos, applications, and APIs to customers globally with low latency, high transfer speeds, all within a developer-friendly environment. We bring the Internet closer to end users, speeding up the user experience and is a service that is . CloudFront is integrated with AWS – both physical locations that are directly connected to the AWS global infrastructure, as well as other AWS services. CloudFront enables multiple services including AWS Shield for DDoS mitigation, Amazon S3, Elastic Load Balancing or Amazon EC2 as origins for your applications, and Lambda@Edge to run custom code closer to customers’ users and to customize the user experience. CloudFront uses a global network of edge locations to cache and route requests to the best possible location based on known latency, available capacity and many other factors.
Related CloudFront Video : https://www.youtube.com/watch?v=wRaPw1tx6LA
CloudFront’s Site Reliability Engineering (SRE) team is looking for experienced System Development Engineers to build out automation of critical service reliability and efficiency functions that ensures massively scaled, fault-tolerant and globally distributed service for our end users. As a SRE engineer you will be builder and not simply a maintainer. We are looking for strong technical candidates with proven analytical, problem-solving, and troubleshooting capabilities who can resolve production escalations by identifying root case and iterate on improving both production and pre-production environments. This is not a typical checkbox monitoring role but an engineering role that specializes in not just understanding how a system is supposed to work but why it doesn’t work as per its service level objectives. Amazon SRE engineers utilizes best practices and tools like 5-why, fishbone, white box and black box monitoring, system resiliency, load balancing/sharing/shedding best practices, failure mode and effects analysis (FMEA), incident management, risk and dependency mapping and predictive service provisioning and capacity planning.
Your responsibilities include but are not limited to:
- Creating and enhancing regression metrics and automated tests
- Identifying and developing processes, tools, automation, and software changes to address top operational issues
- Working in close collaboration with software development leadership and support operations technical leads to shape the future roadmap and establish strong operational readiness across teams
- Leading change to develop simple, elegant solutions to complex operational or reliability challenges
- Utilizing hands-on technical skills to partner with team members and be comfortable diving into the fray as needed
You will diagnose complex problems, develop metrics to measure them and monitoring solutions to manage them. You will build automation and systems to maintain good “fleet hygiene” and software and hardware lifecycle management.
Here are some reasons you should come work with us:
- You will operate an AWS business with globally distributed servers and customers at a massive scale.
- You will own one of the lowest latency and highest throughput services in all of AWS.
- You will get the opportunity to work closely with a great team of system and software developers, principal engineers.
- You will understand what it takes to grow and operate a global business at AWS scale.
- You will be in charge of the complete software development lifecycle, defining, prioritizing, designing, building, and testing new features.
- Are a group of technologists from diverse backgrounds.
- Obsess over our customer’s needs and experience.
- Are owners. We love building new innovative technologies, and improving our existing ones.
- Wear multiple hats. We enjoy the prototyping and tinkering stages, as well as the rigor of making solutions production-ready.
- Are fast-growing, agile and collaborative.
- Enjoy seeing the impact your work has on real customers.
- Are comfortable in an agile environment and create order from ambiguity.
- Build strong teams with others as passionate as you about this mission.
- Take ownership and do what it takes to get the job done.
- Want to create services at a massive scale used by millions of people.
- Use data to make decisions and validate assumptions.
- Learn from others and help grow those in your team to achieve their best.