RunOps / Infrastructure Operations Engineer (L2/L3)
We are seeking a skilled RunOps / Infrastructure Operations Engineer to manage and support enterprise IT infrastructure across on-premises and cloud environments. The role focuses on ensuring high availability, performance, security, and reliability of critical systems, along with supporting disaster recovery (DR) readiness and execution.
The ideal candidate will have hands-on experience in infrastructure operations, incident management, and troubleshooting across computer, network, virtualization, and identity platforms.
Key Responsibilities
Monitor and manage enterprise infrastructure including servers, virtualization platforms, cloud environments, and network components
Ensure high availability and performance of applications and infrastructure services
Support and maintain Disaster Recovery (DR) and Business Continuity (BCP) processes, including periodic testing and validation
Perform incident management, root cause analysis (RCA), and problem resolution for production issues
Provide L2/L3 support and act as an escalation point for complex infrastructure issues
Manage identity and access services, ensuring secure authentication and authorization mechanisms
Support remote access and application delivery platforms
Configure and maintain backup, replication, and failover mechanisms
Collaborate with application, security, and network teams to ensure seamless operations
Maintain and enforce infrastructure security standards, policies, and compliance requirements
Develop and maintain technical documentation, SOPs, and operational runbooks
Participate in on-call support and handle critical incidents or outages
Required Skills & Experience
4–10 years of experience in IT Infrastructure / RunOps / Production Support roles
Strong understanding of:
Server and OS administration (Windows/Linux)
Virtualization and hypervisors
Networking fundamentals (TCP/IP, DNS, VPN, firewalls)
Identity & Access Management (Active Directory or similar)
Experience with cloud platforms (any of the following):
Amazon Web Services
Microsoft Azure
Google Cloud Platform
Exposure to backup, DR, and high availability solutions
Experience with monitoring and alerting tools
Strong troubleshooting and incident management skills
Understanding of security best practices in enterprise environments