datapro.news
Posts
🔭 The new frontier the last 24 months has revealed

🔭 The new frontier the last 24 months has revealed

THIS WEEK: A reflection on where data engineering has come from, and where it is going...

Samuel Williams
June 10, 2026

Dear Reader…

Datapro.news turns two years old this month.

I did not plan to mark this milestone with a retrospective. But sitting here at issue 98, having filed dispatches from many of the significant moments in data engineering over the past two years, it feels wrong not to stop and take stock. We have covered the DeepSeek shock, the EU AI Act enforcement deadline, the arrival of production robotics, the agent sprawl crisis, the collapse of enterprise AI ROI, and the slow, consequential shift from data custodian to something that does not yet have a clean job title. We have been on this frontier together, issue by issue, and this week I want to step back from the weekly news cycle and ask: What does it all actually mean? And where does it leave you?

Because the profession you are practising today is not the one you were practising when this newsletter was in its forties.

There is a moment that several data engineers have described to me, independently, in roughly the same terms. It is sometime in early 2024. They are at their desk, building a retrieval pipeline, experimenting with embeddings, iterating on a prompt. The model does something impressive. There is a feeling of standing at the edge of genuinely new territory, with exactly the right skills to navigate it.

What wasn’t so obvious is that the frontier has been shifting exponentially.

Twenty-four months on, the profession they entered and the one they now practise share a family resemblance, but they are not the same job. The tools, the architecture, the risk profile, the regulatory exposure, and the career conversation have all shifted. What follows is an attempt to map what actually happened, to separate the signal from the retrospective noise, and to ask honestly what it means for the people doing this work.

Check out the video edition here

The Hype Hangover

Cast your mind back to early 2024. The LLM gold rush was at its peak. Every engineering team was bolting retrieval-augmented generation onto something, every startup was promising that prompts would replace pipelines, and data engineers were being told their moment had finally arrived. The data estate was suddenly strategic. The data engineer was suddenly interesting at board level.

What followed was instructive. The enterprise AI failure rate that surfaced from study after study through 2025, hovering around 95 per cent for initiatives failing to reach production at scale, turned out to have nothing to do with the models. The models kept improving. The failures were in the data: inconsistent quality, missing lineage, batch architectures trying to serve real-time inference, governance frameworks that had been deferred until they became blockers.

The explorers who had rushed ahead to claim the AI frontier had left their supply lines unmanned. The territory was real. The maps were wrong.

For data engineers, this was a peculiar kind of vindication that felt nothing like a win. The work they had been doing for years, the unglamorous foundations of enterprise data infrastructure, turned out to matter enormously. But the organisations that needed it most were precisely the ones that had underinvested in it longest.

We covered this moment across several editions. The pattern, when you read those issues back to back, is striking: week after week, a different organisation, a different failure mode, the same root cause.

Three Shifts That Changed the Job

If 2024 was the year of the hype hangover, 2025 was the year when three separate forces converged and changed the shape of the work simultaneously.

The first was economic. The DeepSeek moment of January 2025 did not simply lower the price of reasoning. It exposed the subsidy model that Western AI vendors had been running and forced data engineers to think, for the first time seriously, about model procurement as a portfolio decision. Frontier models for tasks requiring genuine complexity, open weights for the long tail. The concept of "compute economist" was not a job title anyone had studied for. It became a working reality inside quarters, and if you have been reading this newsletter you will recognise it as a thread that has run through almost every issue since.

The second force was regulatory. The EU AI Act's first enforcement milestone, which arrived on 2 August 2025, did something that years of voluntary standards had failed to achieve. It turned compliance into an architectural constraint. Data engineers who had spent careers optimising for throughput and latency found themselves building audit trails, bias detection pipelines, and contestability mechanisms. Governance was no longer the legal team's problem to manage around. It was in the brief from the start.

The third force was physical. The arrival of large behaviour models and production-ready robotics at enterprise scale, announced with the Boston Dynamics Atlas debut at CES 2026 and arriving in logistics and manufacturing environments through the year, changed what data quality actually means in certain sectors. Time-aligned sensor streams. The telemetry of machines operating in physical space. For most data engineers, this remains adjacent rather than immediate. But the direction of travel is visible, and the infrastructure patterns being forged in industrial robotics today will propagate into broader enterprise stacks within the decade.

What It Did to Careers

Here is where the retrospective becomes personal, because this is the question that has landed in my inbox more than any other over nearly 100 editions: What does this mean for my career?

The honest answer is that the past two years separated two groups that had coexisted comfortably inside the profession for a long time. The distinction is not seniority, toolchain, or sector. It is instinct.

The first group are engineers whose core instinct is infrastructure: Make things reliable, observable, and governed. Build the thing that holds together when a fleet of autonomous agents hammers your estate with unpredictable query load at two in the morning. Ask what happens when the model is wrong and nobody notices. Ask who is responsible when the retrieval corpus goes stale and an agent ships yesterday's policy as today's truth.

These engineers have found the past two years accelerating, demanding, and genuinely relevant. The work they do now, context engineering, model operations, evaluation loop design, governance architecture, carries new names. The underlying instinct, rigour under pressure, is the same one they brought to batch ETL five years ago. The explorer has always been latent to this profession. The past two years made it visible.

The second group are engineers whose core relationship with data has been tactical: Move it, transform it, deliver it to a dashboard, move on. For this group, the automation frontier has arrived earlier and harder than most forecasts suggested. The tools for standing up a standard pipeline, a conventional RAG layer, a familiar warehouse schema, are now materially faster in AI-assisted hands than in human ones. That efficiency gain is real, and it has shown up clearly in hiring data across the sector through 2025 and into 2026.

This is not a comfortable thing to write two years in, and it should not be a comfortable thing to read. The explorer who charts new territory discovers that what the return journey requires of them has changed. The skills that kept you valuable in 2023 are not the same set that will keep you valuable in 2027.

The engineers who have navigated this best are not necessarily those with the deepest knowledge of any single tool. They are the ones who learnt to ask the right questions when the maps ran out: What is this system actually accountable for? What happens when it is wrong? Who finds out, and when?

Stop making AI decisions in the dark.

Leadership is asking: where is AI delivering value for us and where is it creating risk? Right now, most teams have no idea.

With Harmonic Security’s Usage Explorer, you get a complete picture of how your organization actually uses AI, automatically categorized into custom use cases with complete tool-level granularity.

See it in action

What the Next Horizon Looks Like

From where we stand now, two currents are clearly visible ahead.

The first is that the role of autonomous systems in enterprise data stacks is going to increase, not plateau. The shift from managing data pipelines to managing the context that feeds autonomous agents is already underway at scale. Agent sprawl, fleets of AI workers consuming your data estate with no institutional memory and no tolerance for ambiguity, has become one of the most acute operational challenges in the profession. The data engineer who can design for this, who can build context that is fresh, governed, and contestable, is not doing a different job from before. They are doing the same job with higher stakes and a broader blast radius.

The second current is accountability. As systems become more autonomous, the question of responsibility for their outputs is becoming more pressing, legally and institutionally. Regulators are already asking it. Courts will ask it. The answer, inside technical organisations, will point to the data engineering function. The explorer who charts new territory also names what they find there, and in naming it, accepts responsibility for what happens next.

The Profession Has Changed Shape

When I filed the first edition of this newsletter, data engineering was a profession that was perpetually promising to become strategic. Now, it is load-bearing inside every organisation that runs on AI. That is the good news. The harder news is that the job is no longer one thing.

The shape of the profession in 2026 is a spectrum. At one end, routine automation work that AI is steadily claiming. At the other, complex systems thinking, governance design, real-time architecture, and model operations that remain deeply human in both demand and accountability. The distance between those two ends, and the urgency of moving along it, is greater than it has ever been.

I have spent 97 editions trying to give you an honest map of this territory, week by week, as it was being charted. The terrain has shifted faster than any of us expected when we started. The engineers who will thrive in the coming years are not the ones who know the most about any single tool. They are the ones who have learnt to navigate by instinct when the maps run out: who can hold rigour and adaptability in the same hand, and who understand that on any genuinely new frontier, the question "what am I actually responsible for?" is more valuable than any certification, toolchain, or model release.

The profession has changed shape. The innate nature of being an explorer, which data engineering was always quietly enacting, has simply become visible at last.

Here is to the next stretch of the journey.