datapro.news
Posts
The Trust Deficit: What the AI Backlash Means to You

The Trust Deficit: What the AI Backlash Means to You

THIS WEEK: Three years on from the ChatGPT moment, public sentiment has curdled. The question is whether data professionals are paying enough attention.

Samuel Williams
May 20, 2026

Dear Reader…

If you have been following our coverage over recent months, a pattern will be familiar to you. We have tracked the efficiency illusion exposed by the METR trial, documented the token cost reckoning now landing on engineering budgets, and watched agentic AI move from novelty to infrastructure challenge at speed. Each week, the picture has grown more complex.

This week, we are stepping back from the tooling and looking at the wider political and social current running beneath all of it, because what is happening in public discourse around AI is not noise. It is signal. And if you are a data professional with your head down in pipelines, governance frameworks, and deployment cycles, the socio-political environment forming around your work deserves your full attention.

Check out the video edition on the community

The Numbers That Should Concern You

Three years after ChatGPT rewired corporate ambitions and popular imagination alike, a reckoning is underway. According to polling data cited by Techstrong.ai, 46% of registered voters now hold negative feelings toward artificial intelligence, while just 26% report a positive outlook. More striking, 57% of respondents believe the risks of AI outweigh its benefits.

Here is the paradox that readers of this publication will find most instructive. Usage of platforms like ChatGPT climbed from 48% of those surveyed in December 2025 to 56% by March 2026. People distrust the technology and use it anyway. This is not a story about abandonment. It is a story about accountability, and that distinction matters enormously for how data teams position their work.

The Architecture of Distrust

The backlash has several interlocking causes. Understanding them as a system, rather than a series of isolated grievances, is the first obligation.

Researchers at Sabanci University describe the phenomenon they call "AI Washing" - the deliberate or negligent exaggeration of a system's intelligence capabilities, dressing up basic rules-based logic as autonomous, adaptive reasoning. When those claims fail in production, the resulting cycle of mistrust becomes self-reinforcing. The researchers give the public rejection of these failures a name: "AI Booing." Regular readers will see the echo of something we covered earlier this month in our piece on the AI paradox. The pattern is consistent. Inflated claims upstream, eroded trust downstream, and users who eventually notice.

Beyond the reputational damage, there is a harder infrastructure reality emerging. The rapid expansion of hyperscale data centres has triggered water shortages and surging utility bills in affected communities. Project cancellations have quadrupled across several markets, and a number of US state legislatures are now actively considering moratoriums on new data facility construction. For anyone working on infrastructure roadmaps for 2026 and beyond, this is no longer a theoretical regulatory risk.

Most structurally significant of all, research from SureBright documents that 95% of enterprise AI pilot projects have failed to yield meaningful revenue growth despite more than $40 billion in generative AI investment. That is a number worth sitting with. The gap between boardroom expectation and operational reality is not a communications problem. It is, as it almost always is, a data problem.

The AI Tax: A Familiar Burden by Another Name

Those of you who have been with us since our April coverage of token economics will find much that resonates in what practitioners are now calling the "AI Tax." According to research published by Leon Ginsburg on Medium, 81% of AI professionals report significant, unresolved data quality problems within their organisation's core data storage systems. When models move from test sandboxes into live production, pipelines regularly fail because of upstream data quality issues, inconsistent master data definitions, missing process logs, and fragmented shadow pipelines operating outside centralised governance.

DAMA practitioners will recognise this immediately. The disciplines of data quality management, metadata management, and data governance that underpin the DMBOK were not invented as abstract compliance exercises. They exist precisely because data in the wild is messy, contested, and politically loaded. The AI moment has not changed this. It has amplified it.

The practical response is a shift from batch monitoring to real-time data observability, specifically automated anomaly detection and live lineage tracking capable of identifying what researchers describe as "silent failures." These are statistical data drift events or unexpected schema changes that do not break a pipeline technically but quietly destroy the accuracy of downstream AI outputs. Architecturally, this is typically implemented through a Data Mesh model, treating data as a decentralised product with strict service level agreements and clearly defined ownership.

When Postgres Optimization Stops Working and What's Next

Meet the Optimization Treadmill - where every “correct” Postgres fix (indexes, partitions, replicas) buys less time while the ceiling stays the same. Analytical workloads expose mechanical limits in MVCC, row storage, planning costs, and WAL that compound as data grows. Learn how to recognize when you’re optimizing… and when the architecture itself is the problem.

Download the full asset

Model Collapse: The Recursive Threat

Beyond pipeline hygiene lies a more insidious technical risk that has not received enough column space in our industry: model collapse. As the volume of web-scraped training data increasingly consists of AI-generated content, future models risk being trained recursively on their own synthetic outputs. The mathematics are unforgiving. Each recursive generation compounds sampling bias, overrepresenting high-probability central patterns while progressively erasing rare tail events. By the third or fourth generation, the model no longer registers that financial anomalies, unusual customer behaviours, or edge-case scenarios exist.

Research from UC San Diego formalises this decay mathematically. As recursive training generations increase, variance in generated data asymptotically decays toward zero. For data teams in financial services, where synthetic data is frequently used to bypass privacy regulations and generate transaction sequences, this is not theoretical. Generative Adversarial Networks used for this purpose are prone to mode collapse, failing to represent rare but critical events such as sudden credit defaults or liquidity shocks.

The operational response requires teams to maintain what researchers call a "non-shrinking real-data anchor", a permanent repository of verified human-generated data. Synthetic data may be added to expand capability, but the absolute volume of real-world data must never decrease. Strict ingestion ratios, for instance 70% verified human data to 30% synthetic, should be enforced with automated pipeline alerts. This is not glamorous work. It is the kind of disciplined data stewardship that this community has always known matters.

Shadow AI and the Invisible Breach

We covered the Claude Agentic Code Incident back in March, and the implications have only grown since. Perhaps the most pressing immediate risk for governance teams today is not what competitors are doing with AI, but what colleagues are doing with it without authorisation.

Shadow AI, the unsanctioned use of AI tools by employees circumventing corporate IT constraints, is structurally different from the Shadow IT problem organisations managed in the last decade. Traditional Shadow IT introduced unauthorised hardware or applications onto the corporate network, detectable through access controls. Shadow AI involves sensitive corporate data leaving the organisation through behaviours that appear entirely normal, such as copying text into a browser, using an unapproved extension, or uploading a document to a public summarisation tool.

The threat has escalated with what Zscaler researchers describe as "Agentic AI," autonomous systems such as Microsoft Copilot or Salesforce Einstein embedded within trusted corporate SaaS environments. Because these agents operate with user-level permissions, they can autonomously read, summarise, and transmit sensitive enterprise data without any deliberate action by the employee. Traditional data loss prevention strategies were not designed for this.

Practical containment requires a structured programme. In the first ten days, teams must establish real-time visibility over every AI application and model endpoint accessed across the enterprise. Legacy firewalls are not adequate for inspecting multi-turn, WebSocket-based natural language sessions. Between days eleven and twenty, inline inspection of AI-directed traffic must be implemented, capable of identifying and blocking proprietary source code, credentials, and personally identifiable information before they leave the secure corporate boundary. In the final phase, blanket prohibition should give way to role-based access policies and sanctioned alternatives, because prohibitions without alternatives simply drive usage underground.

Subscribe to the Data Radio Show

Three Questions Every Data Professional Should Be Asking Right Now

Before we turn to what a sound response looks like, it is worth pausing. The AI reckoning is not a problem to be solved in a sprint. It is a shift in operating conditions that demands honest self-assessment. If you are leading a data team, advising on governance, or building the pipelines that AI systems depend on, these questions should be sitting uncomfortably in the back of your mind.

Can you actually prove your AI outputs? Not explain them, prove them. If a regulator, an auditor, or an angry stakeholder asked you to demonstrate the origin, integrity, and consent trail of the data underpinning a live AI decision, how long would that take your team and what would you find?
Do you know where your organisation's sensitive data is actually going? Not where policy says it should go. Where it is going right now, via the tools your colleagues installed last month without telling IT.
Is your organisation guilty of AI Washing? It is an uncomfortable question, but a necessary one. Are the capabilities you are communicating to stakeholders and customers grounded in what your systems actually do, or in what the vendor deck promised they would do?

There are no easy answers here. But the data professionals who are asking these questions now will be far better positioned than those who are not.

Provenance, Not Just Lineage

Underlying all of these challenges is a governance distinction that the data management community has historically under-emphasised, and one we will return to in greater depth in coming editions: the difference between data lineage and data provenance. Lineage tracks how data flows and transforms across systems. Provenance tracks where it originated, under what consent conditions, and whether it has been altered. In the language of TrueScreen's digital provenance framework, lineage is a travel itinerary. Provenance is a birth certificate.

In an environment where regulators, consumers, and institutional investors are all demanding greater accountability for AI-driven decisions, provenance is fast becoming the primary unit of trust. Tools such as the C2PA cryptographic manifest standard and Zero-Knowledge ML proofs allow organisations to provide mathematically verifiable evidence of data authenticity and human origin.

For data professionals trained in DAMA disciplines, this is a moment of vindication as much as challenge. The skills practitioners have built around data quality, metadata governance, lineage tracking, and stewardship are precisely what the AI accountability moment demands. The task is not to start over. It is to bring those disciplines forward into a domain where the stakes have never been higher.

The backlash is structural. The response must be too.

The Trust Deficit: What the AI Backlash Means to You

THIS WEEK: Three years on from the ChatGPT moment, public sentiment has curdled. The question is whether data professionals are paying enough attention.

Dear Reader…

When Postgres Optimization Stops Working and What's Next

That’s a wrap for this week

Happy Engineering Data Pro’s