As a Site Reliability Engineer (SRE), you will play a critical role in ensuring the reliability, scalability, and performance of our production systems. You will collaborate closely with software engineering and operations teams to build and maintain tools for automation, monitoring, and operations. Your expertise will be crucial in designing resilient and scalable architectures, optimizing application performance, and resolving complex technical issues to deliver a seamless user experience.
Requirements:
- Bachelor’s degree in computer science, Engineering, or a related technical field, or equivalent practical experience.
- Proven experience in a Site Reliability Engineer or similar role, with a focus on designing and implementing scalable systems.
- Strong proficiency in programming languages, scripting and automation (Java, ReactJS, etc.). SITE RELIABILITY ENGINEER Experience with cloud platforms such as AWS, Azure, or GCP, and container orchestration tools like Kubernetes.
- Deep understanding of networking, system administration, Windows, and Linux/Unix-based environments.
- Excellent problem-solving skills and the ability to troubleshoot complex issues in distributed systems.
- Strong communication skills and the ability to work effectively in a collaborative team environment and to stakeholder
Preferred Qualifications:
- Master’s degree in computer science, Engineering, or a related technical field.
- Certification in cloud platforms or DevOps methodologies (e.g., AWS Certified DevOps Engineer, Google Professional Cloud DevOps Engineer).
- Experience with CI/CD pipelines and configuration management tools (e.g., Ansible).
- Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK stack, etc.
- Experience with Agile/Scrum methodologies and practices.