OCSF explained: The shared data language security teams have been missing

The Emergence of Open Cybersecurity Schema Framework (OCSF) in the Security Industry

In the cybersecurity realm, discussions have been centered around models, copilots, and agents over the past year. However, a more subtle transformation is taking place beneath the surface: Vendors are converging around a unified method to articulate security data. The Open Cybersecurity Schema Framework (OCSF) is positioning itself as a leading contender for this role.

OCSF offers vendors, enterprises, and practitioners a standardized approach to represent security events, findings, objects, and context. This standardized representation reduces the need for rewriting field names and custom parsers, allowing more time for activities like correlating detections, conducting analytics, and creating workflows that can seamlessly operate across different products. In a landscape where security teams are integrating endpoint, identity, cloud, SaaS, and AI telemetry, a shared infrastructure has been a long-desired goal, and OCSF is now making it achievable.

Decoding OCSF in Layman’s Terms

OCSF serves as an open-source framework for cybersecurity schemas. It is intentionally vendor-agnostic and neutral, not bound by storage format, data collection, or ETL preferences. Essentially, it provides application teams and data engineers with a common structure for events, enabling analysts to work with a more consistent language for threat detection and investigation.

While this may sound mundane, its significance becomes apparent when considering the daily operations within a security operations center (SOC). Security teams expend considerable effort normalizing data from diverse tools to correlate events effectively. For instance, identifying an employee logging in from San Francisco on their laptop at 10 a.m. and then accessing a cloud resource from New York at 10:02 a.m. could unveil a leaked credential.

Establishing a system capable of correlating such events is a daunting task. Various tools describe the same concept using distinct fields, nesting structures, and assumptions. OCSF aims to alleviate this burden by aiding vendors in aligning their schemas with a universal model, enabling customers to transfer data seamlessly through lakes, pipelines, and security incident and event management (SIEM) tools without the need for laborious translations at every juncture.

The Rapid Evolution of OCSF in the Last Two Years

The visible progress of OCSF has been most pronounced in the past couple of years. The initiative was unveiled in August 2022 by Amazon AWS and Splunk, building on contributions from Symantec, Broadcom, and other prominent infrastructure players like Cloudflare, CrowdStrike, IBM, Okta, Palo Alto Networks, Rapid7, Salesforce, Securonix, Sumo Logic, Tanium, Trend Micro, and Zscaler.

The OCSF community has maintained a consistent pace of releases during this period.

The community has expanded rapidly, with AWS disclosing in August 2024 that OCSF had evolved from a 17-company initiative to a community encompassing over 200 participating organizations and 800 contributors. This number surged to 900 when OCSF joined the Linux Foundation in November 2024.

OCSF’s Pervasive Influence Across the Industry

OCSF has permeated the observability and security landscape extensively. AWS Security Lake converts natively supported AWS logs and events into OCSF format, storing them in Parquet. AWS AppFabric can produce OCSF-normalized audit data. AWS Security Hub findings utilize OCSF, and AWS offers an extension for cloud-specific resource details.

Splunk can transform incoming data into OCSF using edge and ingest processors. Cribl facilitates seamless conversion of streaming data into OCSF and compatible formats.

Palo Alto Networks can transmit Strata logging Service data into Amazon Security Lake in OCSF format. CrowdStrike occupies a dual role in the OCSF ecosystem, translating Falcon data into OCSF for Security Lake and positioning Falcon Next-Gen SIEM to ingest and parse OCSF-formatted data. OCSF stands out as a rare standard that has transitioned from an abstract concept to operational plumbing adopted widely across the industry.

The Role of AI in Reinforcing the OCSF Narrative

With the deployment of AI infrastructure in enterprises, large language models (LLMs) serve as the core component, surrounded by intricate distributed systems like model gateways, agent runtimes, vector stores, tool calls, retrieval systems, and policy engines. These elements generate new forms of telemetry, spanning multiple product boundaries. Consequently, security teams in SOCs are increasingly focused on capturing and analyzing this data. The primary concern often revolves around understanding the actions of an agentic AI system, rather than solely focusing on the text it generates, and evaluating whether these actions lead to security breaches.

This underscores the criticality of the underlying data model. An AI assistant that misguidedly calls a tool, retrieves incorrect data, or sequences risky actions triggers a security event that necessitates cross-system comprehension. In such a scenario, a shared security schema becomes invaluable, particularly when AI is utilized on the analytical front to expedite data correlation.

OCSF’s Emphasis on AI in 2025

Consider a scenario where a company employs an AI assistant to aid employees in accessing internal documents and triggering tools like ticketing systems or code repositories. If the assistant begins fetching incorrect files, utilizing unauthorized tools, and divulging sensitive information in its responses, updates in OCSF versions 1.5.0, 1.6.0, and 1.7.0 empower security teams to reconstruct the sequence of events by flagging anomalous behavior, highlighting system access, and tracing the assistant’s tool interactions step by step. This enhanced visibility allows teams to delve into the entire chain of actions leading to the issue, rather than merely scrutinizing the end result provided by the AI.

Future Prospects for OCSF

Envision a scenario where a company deploys an AI customer support bot, and one day, the bot starts furnishing detailed responses containing internal troubleshooting guidance intended solely for staff. With the impending changes in OCSF 1.8.0, the security team could discern the model responsible for the interaction, the provider supplying it, the role of each message, and the variations in token counts throughout the conversation.

A sudden surge in prompt or completion tokens could signify that the bot received an unusually extensive hidden prompt, extracted excessive background data from a vector database, or generated an excessively lengthy response heightening the risk of sensitive information leakage. This actionable insight provides investigators with a tangible lead on the misstep in the interaction, rather than leaving them with the final output.

Significance of OCSF in the Broader Market

The overarching narrative underscores the swift evolution of OCSF from a collaborative endeavor to a tangible standard integrated into everyday security products. Over the past two years, OCSF has gained robust governance, frequent updates, and practical backing across data lakes, ingestion pipelines, SIEM workflows, and partner ecosystems.

In a landscape where AI expands the security horizon through various vulnerabilities, scams, and novel attack vectors, security teams rely on OCSF to seamlessly link data from diverse systems while preserving context along the way to safeguard critical information.

Nikhil Mungel boasts over 15 years of experience in constructing distributed systems and AI teams in SaaS organizations.