Primary point responsible for the overall health, performance, and capacity of one or more of our Internet- facing services.
Solve difficult and open- ended problems, like virtualizing Mac OS X with GPUs, understanding the latency problems in different scenarios between different devices, or improving speed/response time of applications.
Co- ordinate with external data centers in the event of provisioning, outages or maintenance.
Acquainted with Rails and deployment tools like Capistrano.
Assist in the rollout and deployment of new product features and installations, to facilitate rapid iteration and constant growth.
Write programs/scripts to automate AWS, Ansible and Django based tasks.
Be aware of current CVEs, potential attack vectors, and vulnerabilities, and apply patches as soon as possible.
Handle incident responses, troubleshooting and fixes for various products/services.
Work closely with development teams to ensure that platforms are designed with scale, operability, and performance in mind
Participate in a 24x7 rotation for production issue escalations
Qualifications and Skills
Strong background in Linux/Unix Administration
Experience with automation and configuration management tools and technologies
Ability to use a wide variety of open source technologies and cloud services (experience with AWS is required, AWS certification will be a big plus)
Good experience with SQL and MySQL
A working understanding of code and script (Shell scripting, Kron jobs, PHP, Python, Perl and/or Ruby)
Knowledge of best practices and IT operations in an always-up, always-available service