Job Description
NBCUniversal is one of the world's leading media and entertainment companies. We create world-class content which we distribute across our portfolio of film television and streaming and bring to life through our global theme park destinations consumer products and experiences. We own and operate leading entertainment and news brands including NBC NBC News NBC Sports Telemundo NBC Local Stations Bravo and Peacock our premium ad-supported streaming service. We produce and distribute premier filmed entertainment and programming through our powerhouse film and television studios including Universal Pictures DreamWorks Animation and Focus Features and the four global television studios under the Universal Studio Group banner and operate industry-leading theme parks and experiences around the world through Universal Destinations & Experiences including Universal Orlando Resort home to Universal Epic Universe and Universal Studios Hollywood. NBCUniversal is a subsidiary of Comcast Corporation. Visit www.nbcuniversal.com for more information.
Our impact is rooted in improving the communities where our employees customers and audiences live and work. We have a rich tradition of giving back and ensuring our employees have the opportunity to serve their communities. We champion an inclusive culture and strive to attract and develop a talented workforce to create and deliver a wide range of content reflecting our world.
Job Description
The Staff Reliability Engineer (SRE) for Workplace Engineering is responsible for the reliability performance security and operational excellence of enterprise workplace collaboration & endpoint services used globally by employees and partners. This role applies an engineering mindset to operations-defining service level indicators/objectives (SLIs/SLOs) reducing toil through automation improving observability and strengthening incident response-to ensure a consistent high-quality collaboration experience across messaging meetings voice file sharing knowledge sharing device management platforms & Copilot / AI engineering.
- Microsoft 365: Teams (chat meetings webinars Teams Phone) SharePoint Online OneDrive Exchange Online Microsoft Entra ID (Azure AD) Microsoft Purview Defender for Office 365 Intune (Endpoint Management).
- Hybrid messaging and identity integrations (as applicable): Exchange Server directory synchronization mail flow and routing
- Collaboration endpoints and devices: Teams Rooms certified headsets/cameras conference room AV integrations
- Ecosystem integrations: Power Platform (Power Automate/Apps) Graph API third-party conferencing/messaging where in use (e.g. Zoom/Slack) mail hygiene/security gateways
- Architect and optimize global Microsoft Intune and Jamf Pro environments.
- Orchestrate Windows Updates for Business (WUfB) third-party application patching and compliance policies to maintain a hardened security posture
- Automated packaging and deployment of Windows applications maintaining a rigorous cadence for third-party updates.
- leverage PowerShell and Graph API to automate repetitive configuration tasks and self-healing remediations.
- Partner with Security Operations to remediate vulnerabilities.
- Develop and enforce Configuration Profiles Compliance Policies and Conditional Access rules
- Own the reliability and scaling of Azure Virtual Desktop (AVD) and Windows 365 (Cloud PC) optimizing for both performance and cost-efficiency.
- Define and operationalize SLIs/SLOs and error-budget policies for collaboration services (Teams chat/meetings/voice SharePoint/OneDrive Exchange) with clear customer-impact measurements.
- Own end-to-end reliability engineering: capacity planning performance tuning resilience reviews dependency mapping and proactive risk reduction for critical collaboration journeys.
- Demonstrated expertise in developing operationalizing and scaling AI engineering capabilities including platform design model lifecycle management automation reliability and enterprise adoption.
- Strong knowledge of AI governance frameworks with experience establishing guardrails for responsible AI use risk management security compliance data controls and ongoing operational oversight.
- Build and evolve observability for collaboration platforms: health dashboards telemetry standards alert strategy (high signal/low noise) and synthetic monitoring aligned to user experience.
- Lead incident response for high-severity events: establish incident roles drive rapid triage/mitigation coordinate cross-team communication and produce blameless post-incident reviews with durable corrective actions.
- Engineer automation to reduce operational toil: provisioning policy/config drift detection lifecycle management reporting and remediation using PowerShell and APIs; establish reusable runbooks and self-service patterns.
- Strengthen change and release practices: production readiness reviews controlled rollouts maintenance windows validation plans and rollback strategies to reduce customer impact.
- Partner with Security/Compliance to ensure collaboration services meet governance requirements (identity and access DLP retention eDiscovery information protection) while balancing usability and reliability.
- Provide Staff-level technical leadership: set engineering standards mentor engineers influence roadmap priorities and align stakeholders on reliability tradeoffs and investment.
- Establish and lead reliability operating mechanisms (on-call standards incident command readiness postmortem quality action-item governance and quarterly reliability reviews) to improve consistency across teams.
- Coach mentor and sponsor engineers across levels: provide technical guidance review designs and postmortems and raise the bar on documentation runbooks and operational readiness.
- Drive cross-organization alignment on reliability priorities and investment by presenting trends risks and proposals to leadership; secure commitments and ensure delivery against measurable outcomes.
- Serve as an escalation point for complex cross-domain issues spanning identity messaging endpoints and network dependencies; engage vendors as needed and ensure issues are driven to resolution.
Qualifications
- 12+ years of experience in reliability engineering systems engineering DevOps or large-scale collaboration/communications operations (enterprise or SaaS) including ownership of production services
- Deep expertise with collaboration platforms and ecosystems: Microsoft 365 (Teams-including voice/meetings/Rooms-SharePoint Online OneDrive Exchange Online) and their dependencies (identity endpoints networking)
- Hands-on experience defining SLIs/SLOs building observability (metrics/logs/traces) and operating an incident management program (on-call severity model communications postmortems)
- Strong automation skills with PowerShell and APIs (Microsoft Graph preferred); ability to build tooling that improves reliability and reduces toil
- Experience with cloud identity and access (Microsoft Entra ID/Azure AD Conditional Access MFA RBAC/PIM) and collaboration governance (Purview DLP retention eDiscovery) preferred
- Bachelor's degree in Computer Science/Engineering (or equivalent practical experience)
Desired Characteristics
- Executive-level written and verbal communication skills; able to translate reliability data into clear decisions tradeoffs and action plans
- Proven ability to influence across functions (Security Network End User Computing Architecture Product/Program) without formal authority
- Strong systems thinking and customer satisfaction focuses on user journeys (chat meetings voice file sharing) and measurable experience outcomes
- Demonstrated technical leadership through mentorship sponsorship and talent development; builds inclusive high-performing engineering culture
- High bar for operational excellence insists on clear ownership durable fixes strong postmortems and measurable follow-through
- Comfort operating in ambiguity and driving large multi-quarter improvements with measurable results
Hybrid: This position currently has a hybrid schedule which requires contributing from the office a minimum of four days per week. The Company reserves the right to change in-office requirements at any time.
Additional Information
As part of our selection process external candidates may be required to attend an in-person interview with an NBCUniversal employee at one of our locations prior to a hiring decision. NBCUniversal's policy is to provide equal employment opportunities to all applicants and employees without regard to race color religion creed gender gender identity or expression age national origin or ancestry citizenship disability sexual orientation marital status pregnancy veteran status membership in the uniformed services genetic information or any other basis protected by applicable law.
If you are a qualified individual with a disability or a disabled veteran and require support throughout the application and/or recruitment process as a result of your disability you have the right to request a reasonable accommodation. You can submit your request to [email protected].
Skills Required
- 12+ years of experience in reliability engineering systems engineering DevOps or large-scale collaboration operations
- Deep expertise in Microsoft 365 and its dependencies
- Hands-on experience defining SLIs/SLOs and incident management
- Strong automation skills with PowerShell and APIs
- Experience with cloud identity and access management (Microsoft Entra ID Azure AD)
- Bachelor's degree in Computer Science/Engineering or equivalent
What the Team is Saying

.jpg)



-01.jpg)

-01.jpg)












NBCUniversal Compensation & Benefits Highlights
- Healthcare Strength—Health coverage includes medical prescription dental vision life and disability with mental‑health resources and many benefits start on day one. This breadth and early eligibility point to robust core healthcare support.
- Parental & Family Support—Paid parental leave is outlined at 16 weeks for a primary caregiver and 4 weeks for a non‑primary caregiver alongside fertility adoption and caregiving programs. These offerings indicate strong support for family‑building and caregiving needs.
- Leave & Time Off Breadth—The U.S. time‑off framework includes vacation company holidays personal “myDays” caregiving days flexible sick time and compassionate/bereavement leave. This structure provides substantial paid time away and flexibility across personal and family situations.
NBCUniversal Insights
What We Do
From film television news theme parks interactive media and streaming our people are at the center of it all. Here we solve complex and business-critical problems. That’s why we’re looking for people to help us continue our evolution imagining and delivering the most innovative and disruptive products and services through the latest tech advancements in the industry. Here you can develop solutions. You’ll develop solutions that allow engineers to broadcast live TV from the comfort of their homes. These solutions will enable the use of our collection of hundreds of thousands of distinct intellectual properties across our film television and streaming brands. Here you can transform. You’ll make decisions and solve complex problems by leveraging insights that come from data building AI to help enable solutions to optimize every aspect of our content eco-system. Here you can build. You’ll build emerging immersive technologies that are used to power the broadcasts and streaming of global events like the Super Bowl and Olympics. You can create secure elastic cloud-based services connecting parts of our global platform ecosystem that effect tens of millions of viewers consumers and businesses that consume and love NBCUniversal’s content. And while you design build and architect your career we have the culture to make sure you’re supported. Here you can work and still live your best life! We’re leaders in our fields. We hire smart people and trust them to get the job done. We are never too busy to develop a fellow colleague. We understand our goals – or we ask. When we see something that needs doing – we do it. We make data-driven decisions. We fiercely believe in our talent and their growth. If you're ready to make an impact here you can.
Why Work With Us
For us it's more than just a work life. It's a daily passion. We take great pride in our legacy. We find fun in the challenge. We collaborate and inspire others. We're always creating always solving and always ahead of competition.
Gallery
NBCUniversal Offices
Hybrid Workspace
Employees engage in a combination of remote and on-site work.






Similar Jobs
NBCUniversal
Business Systems Analyst
NBCUniversal
Product Specialist
NBCUniversal
Senior Software Engineer
Explore More
Date Posted
06/25/2026
Views
0
Similar Jobs
Staff Software Engineer, Backend (Communications Platform) -
Views in the last 30 days - 0
View Details