At Amazon, we create and deliver services that delight our customers. Our mission is to be the earth's most customer-centric company. That is a uniquely challenging objective, and we want Amazonians to the best people on the planet in their areas of expertise to help us get there!
We are now looking for a world-class Systems Engineer or Software Engineer who has proven experience working on a blend these roles and is now able to demonstrate that their whole expertise is greater than the sum of these parts.
You have a deep understanding of software development, infrastructure and systems operations. You work where the rubber meets the road (or the software meets the hardware). You are either a software developer who knows very well operations, or a systems engineer who knows very well software development. You are able to switch hats multiple times a day, speaking with developers and system administrators as if you were one of them. You are expected to influence architectural decisions about the software, technical decisions about infrastructure and networking, and business decisions about service features and SLAs.
Your ability to navigate between development and operations and to understand their different objectives and concerns allows you to visualise and operate a bigger picture, taking to yourself the responsibility of bridging the gap between these areas. You architect, design and implement top-class solutions to ensure that services achieve stellar availability, scalability and security levels. Because you trust your vision and your decisions, you are comfortable being the owner of demanding SLOs, and you are confident that you can continuously deliver them. You know that automation is your best friend, and you see efficient operations as an exciting software problem.
In this role, you will be expected to define, create, deliver and own artifacts and activities like:
- Service level objectives and agreements
- Service deployment plans (cloud infrastructure) for high-availability
- Service operation policies and procedures
- Availability and security risk plans
- Disaster recovery plans
- Data retention and disposal plans
- Operational runbooks
- Availability/Security incident tracking, analysis, root-cause evaluation and implementation of corrective measures
- Infrastructure as code - templates / scripts / configuration management tools
- Service monitoring, alarming and logging (design, implementation, maintenance)
- Building tools and deployment procedures and schedules
- Infrastructure sizing, auto-scaling and costing proposals
- Security audits, pen-tests, CTFs
You have a hands-on attitude and are able to deliver sound technical designs, high quality code and comprehensive technical documentation.