How Olivia Chen Breaks Down the Modern Data Stack and Why the Architecture Conversation Matters [Ad]

The modern data stack is one of the most discussed and frequently misunderstood topics in enterprise technology today. Data engineers, CTOs, and startup founders all have opinions on how data should move through an organization, but the vocabulary around these conversations (mesh, fabric, ETL, reverse ETL) can obscure more than it illuminates. Olivia Chen, a data scientist, early-stage investor in B2B tech, and contributor to the Behind The Things newsletter on Substack, has made it part of her professional mission to cut through the noise.

Her piece, "Trends in Modern Data Stack, Part 1," published in November 2022, is a clear-eyed breakdown of the architecture patterns reshaping how companies store, move, and act on data. Written from the dual vantage point of a practitioner and an investor, the article covers five trends, though this piece focuses primarily on the first two: data mesh and data fabric, and the evolution from Extract, Transform, Load (ETL) to ELT (Extract, Load, Transform) and reverse ETL. Together, they tell a coherent story about where enterprise data infrastructure is headed.

Data Mesh and Data Fabric: Design Before Technology

One of the central observations in Chen's analysis is that both data mesh and data fabric are concepts, not products. This distinction matters because it shifts the discussion away from vendor comparisons and toward organizational structure and culture, a framing that is rare in technology writing, which tends to default to tool-level analysis.

Data mesh, as Chen explains it, is fundamentally about decentralization. Instead of a single, centralized data team controlling all pipelines, each business function (sales, finance, supply chain, marketing) owns its own data pipeline, sets its own governance rules, and makes its own data-driven decisions. The definition Chen draws on from data expert Juan Sequeda describes data mesh as "a paradigm shift towards a distributed architecture that attempts to find an ideal balance between centralization and decentralization of metadata and data management."

Chen maps this out in practical terms. A mature data mesh organization has three distinct layers operating in concert: domain-specific teams treating data as a product, a shared platform team providing self-serve infrastructure, and a centralized data governance operation she calls DataGovOps. Each of these layers supports the others. Without the platform team, individual domains would duplicate effort. Without governance, decentralization devolves into fragmentation.

Data fabric, by contrast, leans toward integration rather than decentralization. Where data mesh pushes authority out to the edges, data fabric wraps a policy layer around all data sources — cloud-based and on-premise alike to create a unified view. According to IBM's architecture documentation, data fabric "facilitates end-to-end integration of various data pipelines and cloud environments, allowing for a cohesive and holistic view of data across different functions." Chen's synthesis of these two concepts is useful: data fabric provides the technical foundation for integration; data mesh provides the organizational logic for ownership.

Her insight here is worth dwelling on. The two are often positioned as competitors or alternatives in industry coverage, but Chen argues they are more complementary, one governing how data flows, the other governing who is responsible for it. For anyone buying or building data products, understanding where a company sits on the mesh-fabric spectrum tells you a great deal about who the decision-makers are and what problems they are actually trying to solve.

ETL, ELT, and the Reverse ETL Wave

The second major thread in Chen's piece traces the lifecycle of the extract-transform-load pattern and explains why its inversion, ELT, opened the door to a new category of tooling entirely.

The traditional ETL approach required data to be transformed before it landed in the data warehouse. This worked for structured data but introduced bottlenecks and eliminated the possibility of preserving raw data for downstream analysis. ELT reversed the sequence: extract the data, load it into the warehouse first, then transform it there. The shift sounds minor, but the implications are significant. The data warehouse becomes the single source of truth, and raw data is preserved indefinitely, enabling use cases that ETL would have ruled out entirely, including real-time analytics.

What made ELT viable was the arrival of scalable, cloud-based data warehouses, most notably Snowflake. Snowflake's architecture separates storage from compute, allowing organizations to scale each independently, a design that made storing large volumes of raw data economically practical for the first time. Chen nods to this context directly, noting that ELT "is only made possible with the invention of the scalable cloud-based data warehouse."

Reverse ETL, the third phase Chen describes, takes the transformation further. Once the warehouse contains clean, transformed data, reverse ETL tools push that data back into downstream applications (CRMs, marketing platforms, sales tools), allowing operational teams to act on it without switching systems. The practical effect is that data engineers can build once and distribute broadly, rather than maintaining separate pipelines for every team that wants data.

A 2023 report from Gartner noted that operationalizing analytics, moving insights directly into the tools people already use, is one of the primary challenges organizations face as they mature their data practices. Reverse ETL is one of the more direct answers to that challenge.

Chen's investor perspective adds a layer of candor that is often absent from technical writing on these topics. She views the reverse ETL space as ripe for consolidation: the tools themselves are useful but not transformative, the use cases are well-defined, and the natural endpoint is acquisition by larger data warehouse providers looking to expand their surface area. Her attention turns instead to transformation and data quality; the messier, less glamorous work of ensuring that data flowing into AI and machine learning pipelines is clean enough to be useful.

The Bigger Picture: Data as Infrastructure, Not Just Output

What ties Chen's analysis together is a consistent argument that the data stack conversation has too often been about tools when it should be about architecture and culture. Data mesh will take years to mature, she writes, not because the technology is complex, but because it requires "culture change, organization shift, the embracement of data-driven decision-making within the enterprise leadership, and the endorsement of data as the future fuel of business."

This is the kind of observation that gets left out of product documentation and vendor white papers but shapes every real-world implementation of any data system. The technical patterns described above, mesh, fabric, ELT, reverse ETL, are well-defined. The organizational shifts required to make them work are considerably harder.

Chen's work across her newsletter and professional practice reflects a commitment to making these distinctions legible for a broad audience. Her background spans consulting at Deloitte, an MBA from the University of Chicago Booth School of Business, and her current focus on early-stage B2B technology investment. That combination of practitioner experience, investment pattern recognition, and a habit of writing clearly for non-specialists positions Olivia Chen to translate between the technical and the strategic in a way that is genuinely useful.

Implications for the Data Industry

The trends Olivia Chen lays out in her Substack piece are not predictions so much as a map of where the data industry already was heading in 2022, now playing out in real time. Cloud-native data warehouses have continued to consolidate their position as the organizational center of data operations. The reverse ETL category has seen exactly the kind of consolidation she anticipated. Several of the early players have been acquired or merged with warehouse providers. Data mesh conversations have become commonplace in large enterprise technology departments, though Chen's caution about the cultural difficulty of implementation has proven well-founded.

What remains unresolved, and where the opportunity lies, from both a technical and investment standpoint, is in the quality and governance layer. According to a 2024 survey by Atlan, more than 60 percent of data practitioners report that poor data quality is a primary obstacle to scaling AI and machine learning initiatives. Chen identified this gap early, noting that data engineers spend "hours and hours detecting anomalies and getting data ready for AI/ML applications." The tooling built to address that gap has improved, but the problem has grown faster than the solutions.

The modern data stack is, ultimately, a story about how organizations learn to treat data as a first-class operational asset rather than a byproduct of transactions. Chen's contribution to that conversation, through her Substack, her investment activity, and her broader public writing, is to keep the organizational and architectural questions at the center of a discussion that easily slides into product comparisons.

Image: Google DeepMind - Pexels


This is a sponsored post. The views expressed are the advertiser’s own.

Read next: How Tech Growth Is Taking Shape Across the United States Today
Previous Post Next Post