Data quality is vital for machine learning and AI success

By Space & Time
13 Jun 2024

AI assistance is rapidly being developed or amplified in analytics and marketing solutions to make interpreting data, generating insights and creating reports more efficient for an increasing number of business stakeholders. On top of that, 73% of all new martech solutions released in the last half of 2023 were AI products distributed across a wide range of categories, such as content marketing, business analytics, data science and ecommerce intelligence1. The common theme with all applications is that data quality is fundamental to machine learning and AI success – whether it is making use of the built-in features of an individual platform or looking to take wider business data analysis and insights to the next level.

Starting small: making the most of GA4

Let us start by briefly focusing on an individual platform; GA4. Before we even consider connecting web or app analytics data to our cloud data warehouse for inclusion in wider reporting projects, audits of the accuracy and the extent of user and campaign tracking invariably take place. This is not only to make sure key interactions with the website, conversion points and ecommerce functionality are collected, optimised and specifically tailored for measurement in general, but also increases the benefits of using built-in machine learning and predictive analytics capabilities.

GA4 offers some user-friendly ways of interpreting data and surfacing key results through simply searching via ‘Analytics Intelligence’ as well as generating custom ‘Insights and Recommendations’ to see any changes in your data over time that may need further investigation and flagging immediate anomalies in website performance and tracking that could need fixing. All of these can be informed and enriched by well-structured data and detailed event tracking. Additionally, at the simplest level, just ensuring URL tagging is consistently identifying the right products, sources, mediums and ad content can assist with the accuracy of what GA4 can tell you about your specific digital marketing efforts.

Having a quality ecommerce set up can also make a property eligible for using GA4’s machine learning to generate predictive metrics – purchase probability, churn probability and predicted revenue; crucially, those can be used for building predictive audiences for targeting campaigns in Google Ads and DV360.

Considering your wider data environment

‘As you start your GenAI journey, you must start first by getting your data house in order. Because you might not know what questions to ask your data today, but at some point you will – and if you haven’t retained that data, those answers will always elude you’2

Of course, Google Analytics is only one of many data sources that will be critical for analysing business and marketing performance. How can some of the AI and machine learning methods available in GA4 be applied in an equivalent way to all of this data?

Firstly, even if you do not have a grand AI project in mind, bringing your data sources together will inevitably deliver value for reporting, team efficiency and security. Our most sophisticated dashboards are fed by single custom ‘reporting layers’ from a cloud data warehouse that has numerous connections to analytics, ad platform, CRM, and other data sources. Essentially, those dashboards allow this combined data set to be viewed at group level or drilled down seamlessly to product level (or other segments they are individually designed to support) and encourage the creation of cross-platform measurements – this not only saves time having to gather data across disparate platforms, but also provides a more holistic view of the impact of marketing efforts and investment and brand health.

Getting all of those filters and ways of cutting through data playing nicely in the dashboard does not just happen though. Our data engineers need to create robust automated connections, preprocess to remove inconsistencies or unrequired fields and find product identifiers to group data together across sources. This is a process that is constantly maintained – new channels, additional metrics and new products need to be integrated and anomalies identified and resolved. AI can even be applied at this stage of data preparation and management within the data warehouse to support the work of data engineers and developers – e.g. checking and making code suggestions with the intention of increasing efficiency and freeing uptime to push forward on new applications. Google’s Duet AI for BigQuery is one example of this and is billed as an ‘AI-powered collaborator’.3

Getting on the right track

Simply being able to ask questions of your data using natural language will become a more prominent feature of nearly all business intelligence and reporting tools this year. Our long-term plan for all customer, analytics and ad platform data was to process it into something combined,  flexible and easy to connect to myriad BI solutions.

As a result, it is highly-suited for experiments for surfacing insights in a variety of ways – including making outputs from chat-based interfaces more intelligible and deeply relevant to each business and its products. At the core of that we understand the importance of the consistency of data that is feeding into each individual platform and putting automated checks in place to swiftly resolve anomalies with what’s being ingested into the combined data set or immediately identifying supply issues.

It’s never too early or disadvantageous to focus on your data centralisation and transformation processes regardless of what AI projects and plans you have in mind.


AdvertisingArtificial Intelligence & Machine LearningBusinessClient servicesDataDevelopmentEducationGrowthIndustryMarketing

Latest news