What are your Responsibilities:
- Monitoring applications and infrastructure alerts and reacting quickly.
- Monitoring SLO alerts and Error Budgets for the applications.
- Participate and manage Incident Response Flow for the applications.
- Identifying and implementing automation opportunities to ensure success.
- Closing open support tickets by adhering to SLAs.
- Co-ordinate with various stakeholders to carry out blameless postmortems.
What qualifies you for this role:
- Hands on experience on Linux Platforms.
- Knowledge on technology stack that includes Kubernetes, Docker, PostgresSQL,
- Grafana, Prometheus, Gitlab, Terraform, Jira, Confluence and more.
- Work experience in IT operations or production support model.
- Knowledge on any scripting language Python, Bash Scripting, Go or Perl.
- Team player, flexible and ready to work in (Incentive-Based) 24x7 support model.
What can you expect:
- Enjoy working in energetic and inclusive environment where you explore, learn and contribute.
- Be part of Operations Team which works very closely with talented and experiences Site Reliability Engineers.
- Agile working in self-organised teams (Scrum, Kanban)
- Intensive Knowledge exchange with experienced colleagues.
- Working on platforms that includes Kubernetes OpenShift, Docker, PostgresSQL,
- Grafana, Prometheus, ElasticSearch, Gitlab, Jira, Terraform and more.