- datapro.news
- Posts
- 6 Ways Gen AI is improving Data Modelling
6 Ways Gen AI is improving Data Modelling
GDPR and your right to be deleted
Generative AI can automate and enhance various stages of the data modelling process. We did a bit of digging to find some examples of where Data Engineers are using it in everyday tasks.
1. Automated Schema Generation
Similar to using AI to draft an document outline, Engineers are using it to analyse existing data structures and generate schema recommendations, speeding up the initial phases of database design. This reduces the time required to create and revise data models, but also ensures that the architecture diagrams reflect user intentions, especially if the your customer lacks a technical understanding of data modelling. So a soft benefit is in getting stakeholder buy-in faster.
2. Optimisation Suggestions
By understanding the relationships and usage patterns within a database, data lake or Data Warehouse Generative AI can suggest indexing strategies or modifications to improve performance. This capability can reduce human errors and ensure that the different data models you manage align with business requirements, essentially improving the accuracy and scalability of data engineering.
3. Natural Language Processing (NLP) to ERD Translation
Using NLP, Generative AI can translate plain English descriptions of data requirements directly into complex Entity Relationship Diagrams (ERDs). Still in its infancy there are emerging specialist tools like flow.bi that are collapsing the time to delivery of models.

4. Enhanced Data Augmentation for Machine Learning
Various tools like opensynthetics.com can create synthetic data that mirrors real-world distributions, contributing to robust machine learning models. This is particularly beneficial when labeled data is limited, as the generated data can fill in the gaps and provide more diverse examples for the model to learn from. Equally it can help with denoising data sets where you want to reduce inaccuracy.
5. Predictive Data Distribution and Scaling
AI can help predict data growth patterns and guide resource allocation, paving the way for data platforms that evolve with demand. This ensures that data models are not only efficient but also scalable, capable of handling increasing amounts of data and more complex models without a corresponding increase in human effort.
6. Natural Language Interface for Data Exploration
Agentic interfaces or chat bots empower non-technical users to effortlessly query complex datasets, fostering accessibility and encouraging data-driven decision-making. Check out this article from Data Camp on Using ChatGPT For Data Science Projects.
The ways in which you can speed up and improve the way you can build data models is a rapidly evolving space as specialist RAG LLM’s are making a debut. If you are not experiencing with how Gen Ai can improve your own flow of work, you may be missing a trick. For more insights, and to join the discussion come on over to the Data Innovators Exchange - a community for Data Pro’s.
We scour 100+ sources daily
Read by CEOs, scientists, business owners and more
3.5 million subscribers
Next up we have one of the more popular masterclasses that you will find on @thedataradioshow channel. This relates to the challenges associated with a person’s right to be deleted from a data system - one of the tenants of GDPR.
Here are he highlights from this in-depth discussion with the Experts: Michael Olschimke and Nols Ebersohn.
Enhanced Data Integrity Practices:
Nola and Michael provide insights into how to manage data relationships effectively, ensuring that the integrity of your data remains intact even as relationships evolve over time. For a data engineer, understanding the use of tools like effectivity satellites helps in building robust data models that accurately reflect business realities and support compliance requirements.
Advanced GDPR Compliance Techniques:
How to handle GDPR-related data deletions, a critical area of compliance, continues to be an issue for Data Engineers the world over. The discussion around separating personal data from non-personal data and using retention classes provides actionable strategies to meet legal obligations without compromising the quality or usability of the data.
Optimising Real-Time Data Analytics:
For those involved in building or maintaining real-time analytics systems, the talk provides strategies to optimise performance. The use of pivot tables and bridge tables to minimize data read times and improve the efficiency of real-time data processing is particularly valuable for data engineers looking to enhance the performance of their analytics pipelines.
Architectural Flexibility and Implementation Guidance:
In addition Michael & Nols discuss various architectural approaches like Data Vault and Lake House, to implement and manage large-scale data systems efficiently. Well designed architectures are necessary to build scalable, flexible data environments that can adapt to changing business needs while maintaining compliance with regulations like GDPR.
Automation of Compliance Processes:
Understanding how to automate GDPR compliance, particularly in the context of data deletions and retention, is a key takeaway. For data engineers, the ability to set up automated systems that ensure ongoing compliance without manual intervention is a key data governance task today.
That’s a wrap for this week.
For more on these topics check out the Data Radio Show - on YouTube and where you get your podcasts.