We are currently seeking a Operations to serve as a resource within our mission critical Centers. The position will help ensure overall availability and reliability to meet or exceed defined levels of Engineering Operations. Due to the scale of our Centers, which are quickly increasing in Server/ Network Demand, maintaining a level of integrity has become more than current staffing can manage on their own. His/her involvement will have direct and immediate impact on improving the resiliency, efficiency and capacities of our in an effort to better manage our capital by driving costs to the floor.
- Assisting in the operation, and of all , mechanical, and HVAC equipment within the /
- This equipment supports mission-critical servers and must maintain better than 99.999% uptime
- Assist in and monitoring of all systems to include incidents/events, problems, changes, monitoring, problem escalation/notification/resolution and all other aspects of
- Monitoring and troubleshooting of all mechanical, , HVAC systems, voice/, chiller systems and generators
- Provides assistance to contractor or to ensure proper operation and of all equipment covered under level I plus areas in which they are certified such as making and running fiber optic, certification, HVAC repair (training/certification), UPS and or generator certifications and training
- Provides assistance to contractor or to deploy new equipment, such as, racks, cabling, and other tasks as necessary
Responds to internal Rack customer , repair and additions/expansion requests through the ticketing system
- Operates under minimal supervision
- Perform walkthroughs to verify proper operation of Equipment and Monitoring Systems
- Maintain changes in state in Mission Critical infrastructure in of corrective/ preventive
- Test quality, performance, safety, and reliability of products, equipment, processes
In addition to acting as a First Responder to critical events, this individual will also be responsible for providing leadership and project management in relation to startup of new and upgrade existing . He/she will lead efforts between architecture, , negotiation, and other teams to develop specifications, designs, and cost estimates for future projects. This individual will continue to maintain reliability and performance while keeping operating costs in at a minimum.
A100, part of the Amazon group of companies is an equal opportunity employer.
- Ability to solve problems at their root, stepping back to understand the broader context.
- Aptitude for trouble shooting and problem solving.
- Ability to maintain SLAs through the implementation of proactive issue detection immediate response.
- Ability to write, oversee and follow procedures, system documentation, and issue tracking entries.
- Shows good judgment and instincts in decision making.
- Ability to prioritize in complex, environment.
- Able to demonstrate their ability to take ownership of issues brought to them by their customer base If they are unable to resolve certain issues by themselves, can demonstrate a willingness to actively engage other teams to drive it to resolution.
- Knowledge of level and mechanical system.