Chapter 05: The Role Data & AI Play in AdTech & Programmatic Advertising
The programmatic advertising ecosystem relies on specialised data platforms to collect, organise, analyse, and activate user information across the digital landscape. These sophisticated systems form the technological backbone that enables the precision targeting, real-time decisioning, and performance measurement capabilities that define modern digital advertising.
In this chapter, we’ll look at the different types of data used in programmatic advertising, the various ways data is collected, and the platforms that manage and activate data.
Table of contents
The key data platforms and their role in AdTech & programmatic advertising
Other platforms that collect data
How is data collected in programmatic advertising?
The different types of data: Zero-party, first-party, second-party, and third-party data
The role data plays in AdTech & programmatic advertising
Examples of how data is used in different digital advertising channels
The key data platforms and their role in AdTech & programmatic advertising
There are many different types of data platforms used in programmatic advertising.
Although AdTech platforms such as DSPs and SSPs collect data during real-time bidding auctions, they lack the storage, management, and activation capabilities offered by dedicated data platforms.
Each data platform serves a distinct function in the data value chain, from initial collection through analysis and ultimately to campaign activation.
While they may operate invisibly to consumers, these platforms process trillions of data points daily, forming the essential infrastructure on which programmatic advertising operates.
Data management platforms (DMPs)
A data management platform (DMP) is a centralised system that collects, organises, and activates audience data from various sources, primarily focusing on anonymous, cookie-based identifiers and mobile advertising IDs.
DMPs emerged in the early 2010s as advertisers sought ways to leverage growing volumes of digital data for more effective targeting.
These platforms have traditionally been the workhorses of audience data in AdTech, ingesting signals from websites, apps, and third-party providers to create actionable audience segments.
For example, LiveRamp’s DMP processes trillions of data points monthly across thousands of attributes to build audience profiles that advertisers can target programmatically.
The key functions of a data management platform
A DMP’s capabilities typically include:
- Data collection and aggregation: DMPs ingest data from multiple sources, including websites, apps, advertising platforms, and third-party data providers. For example, a retail advertiser might collect browsing behaviour across their website, app engagement data, and purchase history, then enhance this with demographic information from data partners.
- Audience segmentation: DMPs enable the creation of targetable audience segments based on behaviour, demographics, interests, and intent. A travel company might create segments like “luxury travelers,” “family vacationers,” or “last-minute bookers” based on site behaviour and previous bookings.
- Lookalike modelling: DMPs can identify users who resemble existing high-value customers, expanding targetable audiences. An auto brand might start with known car buyers, then build lookalike models to find similar prospects across the web.
- Data activation: DMPs connect with demand-side platforms (DSPs) and other media buying platforms to make audience segments actionable for targeting. When an advertiser wants to reach their “frequent shoppers” segment, the DMP pushes this audience to connected buying platforms.
DMPs typically integrate with multiple platforms in the advertising ecosystem, creating a bridge between data sources and activation channels.
They send audience segments to DSPs for targeted media buying, provide audience data to ad servers for creative personalisation, enrich performance analysis in analytics platforms with audience insights, and incorporate data from third-party providers to enhance first-party data with additional attributes.
This integration capability has made DMPs central hubs in the data ecosystem, though their influence is evolving as privacy regulations and cookie deprecation change the landscape.
Despite their historical importance, DMPs face significant challenges as the advertising landscape evolves.
Their heavy reliance on third-party cookies, which are being deprecated by major browsers, threatens their core functionality.
Many DMPs have limited ability to handle personally identifiable information (PII), typically operating with anonymous, probabilistic matching rather than deterministic identity. They often lack real-time capabilities for immediate data activation, creating latency between data collection and campaign execution.
These limitations have led to the rise of complementary platforms like CDPs, which address some of these shortcomings by focusing on first-party data and persistent customer profiles.
Customer data platforms (CDPs)
Customer data platforms (CDPs) represent the next evolution in data management, designed specifically to address the limitations of traditional DMPs. These platforms collect, integrate, and organise customer data from multiple sources to create comprehensive, individual-level profiles of known customers.
Unlike DMPs, CDPs are designed to work with identified, first-party data, maintaining persistent profiles that transcend individual sessions or devices.
This fundamental difference has made CDPs increasingly valuable as privacy regulations and browser policies limit third-party tracking capabilities.
The CDP Institute defines a CDP as “packaged software that creates a persistent, unified customer database that is accessible to other systems.”
This seemingly simple definition encompasses sophisticated technology that can transform how brands understand and interact with their customers across touchpoints.
Major CDP providers like Segment (now part of Twilio), Tealium, and ActionIQ have seen rapid growth as organisations prioritise first-party data strategies in response to privacy changes.
How a CDP works
- Data collection: The CDP gathers data from multiple sources—web and mobile activity, CRM systems, transactional platforms, and even offline files and APIs.
- Data normalisation and enrichment: The collected data is standardised, cleaned, and enhanced with useful identifiers (like device IDs or IPs) to ensure consistency.
- Single customer view (SCV) creation: The platform unifies all data points into one comprehensive profile per user—known as the single customer view (SCV). It includes behaviour, preferences, and historical interactions.
- Audience creation: Based on these unified profiles, the CDP creates dynamic audience segments (e.g., frequent buyers, lapsed users, and cart abandoners).
- Activation: These segments are pushed into external platforms—like marketing tools, websites, mobile apps, and ecommerce platforms—for personalised campaigns. Users receive customised experiences through targeted ads, tailored emails, and on-site messages, powered by real-time data from the CDP.
- Reporting and analytics: The CDP provides reporting and analytics tools so companies can see how their audiences are performing across their marketing, sales, and customer support activities.
Key capabilities of CDPs
Modern CDPs offer several core capabilities that distinguish them from other data platforms.
- Identity resolution: CDPs create persistent customer profiles by connecting data across devices and channels through deterministic matching. For instance, a CDP can recognise that a customer who logged into a website on their laptop, mobile app, and in-store loyalty program is the same person, creating a single customer view that persists over time.
- First-party data unification: CDPs consolidate data from owned channels (website, app, CRM, email, in-store) into unified customer profiles. A retailer might combine online browsing data, purchase history, loyalty program activity, and customer service interactions for each individual customer, creating a holistic understanding of customer relationships.
- Real-time data processing: Many CDPs operate in real-time, enabling immediate personalisation and activation. When a customer abandons a shopping cart, a CDP can instantly trigger a personalised email or retargeting campaign, creating timely interventions that improve conversion rates.
- Direct integration with engagement platforms: CDPs connect directly with email platforms, personalisation engines, and customer service systems, not just advertising channels. This allows consistent messaging across all customer touchpoints, creating coherent experiences regardless of where and how customers interact with the brand.
The difference between a DMP and a CDP
The fundamental differences between CDPs and DMPs reflect their different purposes and design philosophies.
CDPs focus on known customers with persistent, PII-based identity, while DMPs primarily target anonymous users through cookie or device ID tracking. CDPs maintain long-term customer profiles, while DMPs typically retain data for shorter periods (often 90 days).
CDPs primarily work with first-party data collected directly from customer interactions, while DMPs incorporate both third-party and first-party data sources.
CDPs support omnichannel orchestration across web, email, app, and other touchpoints, while DMPs focus primarily on digital advertising optimisation. These distinctions have become more significant as privacy changes limit third-party tracking capabilities.
In practical applications, CDPs enable sophisticated customer experiences and marketing strategies. The uses of CDPs extend beyond advertising to support broader customer experience initiatives, making them valuable assets for comprehensive marketing strategies.
Data clean rooms
Data clean rooms represent one of the most significant recent innovations in the AdTech landscape, having emerged in response to growing privacy concerns and data-sharing restrictions.
These secure environments enable multiple parties to analyse combined datasets without exposing raw, user-level data to each other, preserving consumer privacy while unlocking valuable collaborative insights.
How a data clean room works
- Company A and Company B upload their data to isolated servers in the data clean room.
- The data clean room matches IDs from the two isolated servers, maps them together, and incorporates various privacy controls, such as encryption.
- The data is then activated and used for various activities, such as audience targeting, look-alike modelling, and reporting and analytics.
As walled gardens have restricted data export and regulations like GDPR and CCPA have limited data sharing, clean rooms have become essential infrastructure for privacy-compliant analytics and activation.
The concept originated with Google’s Ads Data Hub, launched in 2017, which allows advertisers to analyse campaign data within Google’s ecosystem without directly accessing user-level information.
Since then, the clean room landscape has expanded and now includes several distinct types of implementations, each serving different use cases and partnership models.
Companies like LiveRamp, InfoSum, and Decentriq, have developed independent solutions for broader data partnerships across multiple organisations, and major platforms like Amazon and Meta have created their data clean rooms for analysing data within their own ecosystems.
This proliferation reflects the growing need for privacy-safe data collaboration in a more regulated advertising ecosystem.
The key functions of a data clean room
Data clean rooms implement several key mechanisms to balance data utility with privacy protection.
- Secure data sharing: Participants upload encrypted or anonymised data into a protected environment with strict access controls and data governance. These environments may use advanced cryptographic techniques, such as homomorphic encryption, that allow computations on encrypted data without decrypting it.
- Privacy-preserving analysis: Analysis occurs at an aggregate level, with privacy thresholds that prevent individual-level data exposure. For example, results might only show segments with at least 200 users, implementing k-anonymity principles that prevent re-identification of specific individuals.
- Controlled outputs: Only approved aggregate insights and activation signals leave the clean room, not raw data. These outputs undergo privacy checks to ensure they don’t inadvertently reveal protected information through combinations of attributes or filtering techniques.
- Contractual and technical safeguards: Both legal agreements and technical measures ensure data cannot be misused or exposed. These multilayered protections create a secure foundation for data collaboration that satisfies privacy requirements while enabling valuable analysis.
As privacy regulations tighten and walled gardens restrict data sharing, clean rooms have become increasingly important infrastructure for data-driven marketing.
They enable valuable data collaboration while maintaining user privacy and data security, creating a sustainable approach to analytics and activation in a privacy-first world.
Data warehouses
While less visible to marketers than other AdTech platforms, data warehouses provide the foundational infrastructure that enables sophisticated analytics and decision-making throughout the programmatic advertising ecosystem.
These centralised repositories store structured and semi-structured data from multiple sources, creating comprehensive information resources for reporting, analysis, and algorithm training.
In AdTech, they store vast amounts of advertising and customer data, often processing petabytes of information daily.
Data warehouses have evolved significantly with the growth of cloud computing, transitioning from on-premises systems with limited capacity to scalable cloud platforms that can handle virtually unlimited data volumes.
This evolution has democratised access to big data capabilities, allowing companies of all sizes to implement sophisticated data strategies that were previously only available to the largest organisations.
Modern data warehouse providers like Snowflake, Google BigQuery, Amazon Redshift, and Databricks have transformed how AdTech companies manage and analyse their data assets.
The role of data warehouses in the AdTech ecosystem
Data warehouses fulfill several critical functions in programmatic advertising.
- Historical data storage: Data warehouses maintain long-term storage of campaign performance, user behaviour, and conversion data, often spanning years of activity. This historical perspective enables trend analysis and seasonal planning that inform strategic decisions.
- Data integration: They consolidate information from multiple platforms (DSPs, ad servers, web analytics, etc.) into a unified view for comprehensive analysis. This integration overcomes the fragmentation that characterises the AdTech landscape, creating a single source of truth for performance assessment.
- Business intelligence: Data warehouses power reporting dashboards and custom analysis that drive business decision-making. These interfaces transform raw data into actionable insights that guide campaign optimisation and budget allocation.
- Machine learning foundation: They provide the training data for predictive models and AI applications in advertising. The vast stores of historical performance data enable the development of algorithms that can predict outcomes and optimise campaigns in real time.
Technological advancements in data warehouses
Modern data warehouses incorporate several technological advancements that have transformed their capabilities.
- Snowflake pioneered the separation of storage and compute resources, allowing each to scale independently based on requirements. This architecture enables cost-effective handling of massive datasets while maintaining query performance.
- Google BigQuery offers serverless, high-performance SQL querying that eliminates infrastructure management requirements, allowing analysts to focus on insights rather than maintenance.
- Amazon Redshift delivers petabyte-scale data warehouse capabilities with columnar storage optimised for analytical workloads.
- Databricks combines data lake and data warehouse functionality, enabling both structured and unstructured data analysis within a unified platform.
These advanced platforms have dramatically increased the accessibility and utility of data warehouse technology.
Other platforms that collect data
Apart from the data platforms listed above, various platforms within the programmatic ecosystem collect different types of data, each serving specific functions in the advertising value chain.
Data providers
Data providers, aka data brokers, are companies that collect a different types of consumer data from various source:
- Demographic information: Age, gender, income, education
- Interest and intent data: Purchase interests, research behaviour
- Purchase history: Categories and products bought
- Location histories: Physical world movement patterns
- Household composition: Family structure, life stage, presence of children
- B2B firmographic data: Company size, industry, job titles
Many data providers function as data management platforms whereby they make their data available to companies for ad targeting, market research, and measurement.
Measurement providers
Measurement and attribution companies collect data to measure the performance of advertising campaigns.
They collect various types of data:
- Cross-platform exposure data: Ad views across different channels and devices
- Conversion path information: The sequence of touchpoints leading to conversion
- Brand impact metrics: Awareness, consideration, and perception changes
- Viewability and attention data: How ads are viewed and engaged with
- Cross-device user journeys: How users move between devices
- Offline and online connection points: Linking digital exposure to physical world actions
One well-known measurement provider is Nielsen, which combines digital panel data with broader measurement methodologies to provide advertisers with unified audience insights across channels.
How is data collected in programmatic advertising?
The programmatic advertising ecosystem relies on continuous data collection to power targeting, optimisation, and measurement.
Several technical methods enable this data flow, each serving different purposes within the ecosystem.
JavaScript
JavaScript is a programming language that runs in web browsers and enables interactive functionality on websites.
In programmatic advertising, JavaScript is the primary mechanism for collecting user behaviour data on websites.
Key JavaScript collection methods
- Analytics scripts: Code snippets from tools like Google Analytics that track page views, clicks, scrolling behaviour, and user engagement.
- Tag management systems: Platforms like Google Tag Manager or Piwik PRO that centralise and manage multiple tracking scripts, reducing page load impact while facilitating data collection.
- Conversion tracking: JavaScript code that fires when users complete valuable actions, such as purchases, sign-ups, or form submissions.
- A/B testing frameworks: Scripts that enable experimentation and user experience optimisation while collecting behavioural data.
How it works in practice
When a user visits a website, embedded JavaScript code executes in their browser, collecting information about their interactions and sending this data back to collection servers.
For example:
Javascript
// Sample analytics tracking code
analytics.track('Product Viewed', {
product_id: '123',
product_name: 'Running Shoes',
product_price: 129.99,
category: 'Footwear',
currency: 'USD'
});
This code might fire when a user views a product, sending valuable behavioural data that could later be used for retargeting or segmentation.
JavaScript-based collection offers significant flexibility and depth, capturing detailed interaction data, but it requires proper implementation and can be blocked by ad blockers or a web browser’s privacy settings.
Pixels and tags
Pixels (or tags) are small, often transparent 1x1px image files or code snippets placed on websites or in emails that trigger data collection when loaded by a user’s browser.
Common types of pixels and tags
- Conversion pixels: Placed on confirmation pages to track when users complete valuable actions after seeing an ad.
- Retargeting pixels: Collect information about products or content viewed to enable retargeting campaigns.
- Audience pixels: Build user profiles based on site visits for future targeting.
- Attribution pixels: Track which channels and touchpoints contributed to conversions.
How pixels work
- A user takes an action (like visiting a page or opening an email)
- This triggers a request to load the pixel from a tracking server
- The request includes parameters about the user and their actions
- The tracking platform records this information and associates it with the user’s profile
- Often, a cookie is simultaneously set or updated to maintain user identity
For example, when a user views a product on an e-commerce site, a Facebook pixel might fire:
html
<img height="1" width="1" src="https://www.facebook.com/tr?id=123456789&ev=PageView&product_id=ABC123&value=49.99" style="display:none" />
This invisible pixel sends information to Facebook about the products the user viewed, which can later be used for retargeting on Facebook’s platforms.
Pixels are widely used due to their simplicity and reliability, but they face the same challenges as JavaScript regarding ad blocking and privacy controls.
Piggybacking
Piggybacking (aka “piggyback tags” or “nested pixels”) is the practice of one tracking tag triggering additional, secondary tags – essentially piggybacking on the initial tag’s execution.
How piggybacking works
- A primary tag loads on a webpage
- This primary tag contains instructions to load additional third-party tags
- These secondary tags then collect and transmit data to their respective platforms
- This chain can sometimes extend to other tags, creating a cascade of data collection
Piggybacking allows efficient deployment of multiple tracking solutions, but it can create page performance issues, privacy concerns, and data governance challenges.
Many privacy regulations now require explicit disclosure of all data collection, including piggybacked tags.
APIs
Application programming interfaces (APIs) enable direct, server-to-server data exchange between different platforms and systems without requiring user-facing code like JavaScript or pixels.
How APIs collect data in AdTech
- Server-side tracking: Sends data directly from a company’s servers to advertising platforms, bypassing browser-based collection.
- CRM integrations: Connects customer databases directly with advertising platforms for audience targeting and suppression.
- Conversion APIs: Reports conversion events directly from business systems to ad platforms (like Facebook’s Conversion API or Google’s Enhanced Conversions).
- Data onboarding: Transfers offline data to online advertising platforms through secure API connections.
Advantages of API-based data collection
- Reliability: Not affected by ad blockers, JavaScript errors, or browser privacy settings.
- Security: Can implement stronger authentication and data encryption.
- Completeness: Captures events even when client-side tracking fails.
- Performance: Doesn’t impact website loading speed or user experience.
- Privacy compliance: Often provides more control over what data is shared.
API-based data collection is growing in importance as client-side tracking faces increasing limitations from privacy regulations, browser restrictions, and consumer protection tools.
The different types of data: Zero-party, first-party, second-party, and third-party data
In programmatic advertising, data is classified based on its source and the relationship between the data collector and the consumer.
Understanding these classifications is essential for both strategic planning and privacy compliance.
Interestingly, these data categorisations parallel the cookie classifications discussed in Chapter 4, with similar implications for privacy, control, and data quality.
Zero-party data
Zero-party data is information that consumers explicitly and intentionally share with a brand.
The term was coined by Forrester Research to distinguish it from passively collected first-party data. This data is deliberately provided by users through direct inputs.
How is it collected?
Zero-party data collection methods include:
- Preference centres: Where users select their interests and communication preferences
- Surveys and quizzes: Interactive content that gathers explicit user preferences
- Profile information: Details users provide when creating accounts
- Product configurators: Tools where users customise products to their specifications
- Feedback forms: Direct input about experiences and preferences
What is it used for?
Zero-party data powers:
- Hyper-personalisation: Tailoring experiences based on stated preferences
- Content recommendations: Suggesting relevant content or products
- Communication customisation: Delivering messages aligned with expressed interests
- Product development: Informing new offerings based on explicit customer needs
- Preference-based segmentation: Creating audience groups based on declared interests
First-party data
First-party data is information a company collects directly from its audiences or customers through owned channels and properties.
This data is gathered during direct interactions with a brand’s assets.
How is it collected?
Common first-party data collection methods include:
- Website tracking: Behavioural data from site visits and interactions
- Mobile app usage: In-app behaviour and engagement patterns
- Purchase history: Transaction records from online or in-store purchases
- CRM data: Customer information maintained in relationship management systems
- Subscription information: Data from newsletter or service enrollments
- Customer service interactions: Records from support conversations
What is it used for?
First-party data enables:
- Audience segmentation: Grouping users based on behaviour and attributes
- Personalised marketing: Tailoring messages based on past interactions
- Retargeting campaigns: Re-engaging users who have shown specific interests
- Customer journey optimisation: Improving experiences across touchpoints
- Predictive modelling: Forecasting future behaviour based on historical patterns
- Loyalty programs: Recognising and rewarding customer engagement
Second-party data
Second-party data is essentially another organisation’s first-party data that is shared directly with you through a partnership or commercial arrangement.
It maintains the quality and precision of first-party data while extending reach beyond your own audience.
How is it collected?
Second-party data is typically acquired through:
- Direct data partnerships: Strategic relationships between complementary brands
- Private marketplaces: Controlled environments for data sharing
- Data clean rooms: Secure platforms for privacy-compliant data collaboration
- Publisher partnerships: Arrangements with media properties to access their audience data
- Co-branded initiatives: Joint ventures that generate shared customer data
What is it used for?
Second-party data facilitates:
- Audience expansion: Reaching new users similar to existing customers
- Enhanced customer insights: Gaining a more complete view of shared audiences
- Partnership marketing: Creating co-branded experiences based on shared insights
- Improved targeting: Refining campaign targeting with complementary data
- Market expansion: Identifying opportunities in adjacent customer segments
Third-party data
Third-party data is information collected by entities that don’t have a direct relationship with the consumers the data represents.
It’s typically aggregated from multiple sources, packaged into audience segments, and sold broadly across the advertising ecosystem.
How is it collected?
Third-party data comes from various sources:
- Data aggregators and providers: Companies that compile information from multiple providers
- AdTech platforms: DSPs, SSPs and ad exchanges collect third-party data during real-time bidding (RTB) auctions.
- Publisher networks: Consortiums that pool audience data
- Public records: Government and publicly available information
- Surveys and panels: Research conducted across broad populations
- Purchase data cooperatives: Anonymised transaction data from multiple retailers
- Web and app tracking networks: Behavioural data collected across digital properties
What is it used for?
Third-party data supports:
- Scale: Reaching large audiences beyond existing customer bases
- Demographic targeting: Accessing segments based on age, income, or education
- Interest-based audiences: Targeting users based on interests or behaviours
- Prospecting: Finding new potential customers with relevant characteristics
- Contextual enhancement: Enriching first-party data with additional attributes
- Market research: Understanding broader consumer trends and segments
As privacy regulations tighten and third-party cookies disappear, the value of zero-party, first-party, and well-governed second-party data continues to increase.
Organisations are increasingly prioritising direct data relationships with consumers while reducing reliance on third-party sources.
Data onboarding
Data onboarding is the process of transferring offline customer data to online platforms where it can be used for digital advertising targeting and measurement. It bridges the gap between physical world information, like in-store purchases or CRM records, and digital advertising environments.
Data onboarding typically refers to importing offline data into an online data platform, however, it can also refer to importing online data into a data platform.
For the sake of consistency, we’ll refer to data onboarding as the process of importing offline data into an online data platform.
How data onboarding works
Data onboarding typically follows these steps:
- Data preparation: An advertiser organises their offline data, typically from EFTPOS systems, CRMs, and loyalty programs. This might include customer names, postal addresses, email addresses, phone numbers, purchase history, and segment information.
- Identity resolution: The offline identifiers (like names and addresses) are translated into digital identifiers (like cookie IDs, mobile advertising IDs, or connected TV identifiers). This process usually happens through an onboarding provider like LiveRamp or Neustar.
- Privacy protection: Personal information is hashed or encrypted to protect consumer privacy. For example, an email address like customer@example.com might be transformed into a secure hash like 5f4dcc3b5aa765d61d8327deb882cf99.
- Matching to online identifiers: The hashed identifiers are matched against databases of online users. This matching can be deterministic (exact matches) or probabilistic (likely matches based on multiple signals).
- Audience creation: Once matched, the offline data can be used to create targetable online audience segments. For instance, “customers who purchased in-store in the last 30 days” or “high-value cardholders.”
- Activation: These audience segments are then distributed to DSPs, social platforms, or other advertising systems for campaign targeting.
- Measurement: After campaigns run, conversion data can be onboarded using the same process to connect offline outcomes with online advertising exposures.
Data onboarding challenges
While data onboarding offers powerful capabilities for bridging offline and online worlds, the process faces several significant challenges that impact its effectiveness and accessibility.
These challenges range from technical limitations to privacy concerns and operational complexities.
Organisations implementing onboarding strategies must navigate these obstacles to maximise the value of their data while maintaining compliance with evolving regulations and consumer expectations around privacy.
Understanding these challenges is essential for realistic planning and effective implementation of data onboarding initiatives.
Match rates
Match rates represent perhaps the most fundamental challenge in the onboarding process.
Only a portion of offline records successfully match to online identifiers, with typical match rates ranging from 20-70% depending on data quality and onboarding provider capabilities.
This matching limitation means that even in the best scenarios, organisations lose visibility to a significant portion of their customer base during the translation from offline to online environments.
Match rates vary considerably based on several factors:
- Data quality and completeness: Outdated contact information, formatting inconsistencies, or missing fields significantly reduce match rates
- Identity type: Email-based matching typically achieves higher rates than postal address matching
- Recency of data collection: Recently collected data generally matches at higher rates than older records
- User online activity: Customers with higher digital engagement are more likely to match to online identifiers
- Onboarding provider capabilities: Different vendors maintain different identity graphs with varying coverage
The incomplete nature of match rates creates analytical challenges when measuring campaign performance, as the matched audience may not perfectly represent the overall customer base.
This potential sampling bias must be accounted for when evaluating campaign results and calculating return on investment.
Data freshness
Data freshness presents another significant challenge, as the onboarding process takes time, creating potential delays between offline events and online activation.
Traditional onboarding workflows often involve batch processing with significant latency – sometimes days between data submission and audience availability in advertising platforms.
This delay creates particular challenges for time-sensitive campaigns, such as:
- Promotional campaigns tied to short-term events or holidays
- Retargeting efforts targeting recent in-store shoppers
- Suppression lists for customers who have just made a purchase
- Triggered campaigns based on offline status changes
While real-time onboarding solutions have emerged to address this challenge, they typically involve trade-offs between speed and match quality, with faster processes often achieving lower match rates or requiring more limited data sets.
Fragmented identity
Fragmented identity across digital environments creates another layer of complexity in the onboarding process. Different platforms use different identity spaces, requiring multiple onboarding processes for comprehensive coverage.
A customer successfully onboarded to display advertising platforms might not be recognised in connected TV environments or social media platforms, each of which maintains their own identity space.
This fragmentation forces organisations to:
- Manage multiple onboarding relationships and processes
- Reconcile discrepancies in match rates and performance across platforms
- Develop platform-specific strategies rather than truly unified approaches
- Allocate budget across various onboarding partners to achieve comprehensive coverage
The fragmentation problem has worsened as advertising platforms have increasingly walled off their data environments, limiting the portability of audience identifiers and forcing more siloed approaches to audience activation.
Privacy regulations
Privacy regulations introduce another layer of complexity to the onboarding process, with varying requirements across jurisdictions affecting what data can be onboarded and how it must be processed.
Regulations like GDPR in Europe and CCPA in California impose strict requirements on data sharing and processing that directly impact onboarding workflows:
- Consent requirements may limit which customer records can be legally onboarded
- Data minimisation principles restrict what attributes can accompany identifier matching
- Cross-border data transfer limitations may affect global onboarding initiatives
- Special category data, like health information, may face additional restrictions
- Processing limitations may require explicit disclosure of onboarding activities
As privacy regulations continue to evolve, organisations must continuously update their onboarding processes to maintain compliance while maximising effectiveness within legal boundaries.
Measurement complexity
Measurement complexity represents the final major challenge in the onboarding ecosystem, as connecting offline conversions back to specific campaign exposures involves sophisticated methodologies and statistical modelling.
The technical difficulties include:
- Timing disparities between digital ad exposure and offline actions
- Attribution uncertainty when multiple campaigns reach the same customers
- Identity gaps between measurement and targeting environments
- Data silos between digital analytics and offline transaction systems
- Incrementality assessment to distinguish campaign-driven outcomes from baseline behaviours
These measurement challenges often require advanced analytics capabilities and clean room environments to resolve, adding another layer of complexity to the onboarding ecosystem.
Despite these challenges, data onboarding remains a critical capability for organisations seeking to connect their offline customer relationships with digital advertising environments.
As the technology continues to evolve, solutions are emerging to address each of these limitations, though the fundamental tensions between privacy, coverage, accuracy, and speed are likely to persist in some form as inherent characteristics of the onboarding landscape.
The evolution of data onboarding
Data onboarding has evolved significantly:
- First generation: Basic email-to-cookie matching with limited scale and accuracy
- Second generation: Multi-identifier approaches using emails, addresses, and phone numbers with improved match rates
- Current generation: Persistent people-based identity graphs with cross-device capabilities and privacy-by-design architecture
- Emerging approaches: Privacy-preserving techniques like clean rooms, differential privacy, and federated learning that enable data activation without direct identifier sharing
As third-party cookies disappear, data onboarding becomes even more valuable for connecting offline data with digital environments through more persistent and privacy-compliant identity solutions.
The role data plays in AdTech & programmatic advertising
Data is the foundational element that powers almost every aspect of programmatic advertising, transforming what was once a broad, imprecise market into a precision-targeted ecosystem.
Understanding its multifaceted role helps explain why data has become the most valuable currency in digital advertising.
Precision targeting
Data enables advertisers to move beyond contextual or demographic targeting to reach specific individuals based on their actual behaviours, interests, and needs.
Historical example: In the pre-data era, a luxury watch brand might advertise in golf magazines, assuming readers had high incomes. Today, they can precisely target verified high-income individuals across any channel, whether it’s a news website, streaming service, or social media platform.
Practical application: A cruise line uses data to identify and target people who have researched vacations in the past 30 days, have household incomes above $100,000, have no children at home, and have previously taken a cruise – a level of precision impossible without comprehensive data.
Real-time decisioning
Programmatic advertising requires split-second decisions about which impressions to bid on and how much to pay. Data makes these instant decisions possible.
Bidding intelligence: When a DSP evaluates a bid request, it might analyse dozens of data points – the user’s previous interactions with the brand, their propensity to convert, their customer lifetime value, and the historical performance of similar impressions – all within milliseconds.
Value assessment: Data helps assign different values to different users. A financial services company might bid five times higher to reach a user actively researching mortgage rates compared to a general browser of financial news.
Personalisation
Data allows advertisers to customise creative messages based on user characteristics, behaviours, and contexts.
Creative relevance: A retailer might dynamically adjust ad creative to showcase products a user has previously viewed, display locally available inventory, or feature relevant seasonal items based on the user’s location and weather conditions.
Sequential messaging: Data enables advertisers to tell progressive stories, showing different creative messages based on a user’s previous ad exposures. A user might first see a brand awareness message, then a product feature highlight, and finally a special offer – all orchestrated through data.
Measurement and attribution
Data connects advertising exposures to business outcomes, enabling performance assessment and optimisation.
Multi-touch attribution: By tracking user journeys across touchpoints, advertisers can understand which channels, messages, and sequences drive conversions. A retail brand might discover that social media creates awareness, display ads build consideration, and search ads capture final conversion intent.
Incrementality testing: Advanced data applications enable advertisers to measure true incremental impact by comparing outcomes between exposed and control audiences. A subscription service might determine that their programmatic campaign generated 40% truly incremental conversions that wouldn’t have occurred otherwise.
Optimising campaign performance
Data drives continuous improvement throughout campaign execution.
Real-time optimisation: As campaign data accumulates, algorithms automatically adjust bidding strategies, creative selections, and targeting parameters to improve performance. Underperforming segments or placements receive less budget, while high-performing opportunities receive more.
Predictive modelling: Historical performance data trains models that predict which impressions are most likely to result in conversions. These predictions become more accurate as more data is collected, creating a virtuous cycle of improvement.
Reducing waste and improving efficiency
Data helps advertisers eliminate ineffective or unnecessary ad exposures.
Audience suppression: Advertisers can exclude existing customers from acquisition campaigns or suppress users who have already converted. An insurance company might stop targeting users who have already purchased policies, reducing wasted impressions.
Frequency management: Data allows advertisers to control how often individuals see their ads, preventing ad fatigue and optimising exposure levels. Research shows that optimal frequency varies by campaign objective, with awareness requiring more impressions than direct response.
Enabling innovation
Rich data foundations enable the development of new advertising approaches and technologies that extend beyond traditional digital formats.
The combination of robust data collection, advanced identity solutions, and sophisticated analytics platforms has unlocked entirely new advertising channels and approaches that were technologically impossible just a few years ago.
These innovations demonstrate how data has become not just an optimisation tool but a fundamental catalyst for expanding the boundaries of programmatic advertising into new environments and contexts.
Examples of how data is used in different digital advertising channels
Advanced TV targeting
The transformation of television advertising represents one of the most significant data-driven innovations in recent years.
Data bridges traditional television with digital precision, allowing advertisers to reach specific households based on viewing habits, purchase behaviour, and demographics rather than broad program ratings.
This evolution has created several distinct approaches to data-enhanced TV advertising.
Connected TV (CTV)
Connected TV (CTV) leverages digital delivery of television content through internet-connected devices, enabling household-level targeting and measurement similar to other digital channels.
A consumer packaged goods (CPG) brand can now target households with children that have previously purchased related products, rather than broadly targeting all viewers of children’s programming.
Addressable TV
Addressable TV works within traditional cable and satellite distribution systems to deliver different ads to different households watching the same program.
This capability allows an automotive advertiser to show SUV commercials to households with children while showing sports car ads to empty-nesters watching the same content.
Automatic content recognition (ACR)
Automatic content recognition (ACR) uses audio or visual fingerprinting to identify what content is being viewed on smart TVs, creating viewing behaviour profiles that inform targeting decisions.
This technology can tell advertisers which households watch cooking shows regularly, making them prime targets for kitchen appliance advertisements regardless of what content they’re currently watching.
Without comprehensive data infrastructure, these targeting capabilities would be impossible, relegating TV advertising to the broad demographic targeting that characterised the medium for decades.
Data has fundamentally transformed how television advertising functions, making it increasingly integrated with broader programmatic strategies and cross-channel campaigns.
Audio programmatic
The explosive growth of digital audio consumption has created new advertising opportunities that rely heavily on listener data and programmatic technology.
Streaming music platforms like Spotify, Pandora, and Apple Music, along with podcast networks and digital radio providers, use listener data to enable targeted audio advertising that reaches users with relevant messages.
The data infrastructure behind programmatic audio enables several key capabilities.
Contextual audio targeting
Contextual audio targeting is based on what content users are consuming, such as targeting workout-related products during fitness playlists or language learning apps during educational podcasts.
Behavioural segmentation
Behavioural segmentation uses historical listening patterns, app usage, and other digital signals to build listener profiles. A luxury travel brand might target users who frequently listen to travel podcasts, international music, and have high-end listening habits.
Sequential messaging
Sequential messaging delivers a series of audio ads that tell a progressive story as listeners engage with content throughout their day, building awareness and consideration through carefully orchestrated messaging.
Cross-device attribution
Cross-device attribution connecting audio ad exposures to actions taken on other devices, solving the measurement challenge inherent in audio’s screenless nature. A retailer might trace a path from a podcast ad to a website visit on a mobile device to a purchase on a desktop computer.
These capabilities demonstrate how data has transformed audio from a mass reach medium to a precisely targetable channel that can reach specific audience segments with tailored messages at optimal moments.
The combination of intimate, voice-based advertising with rich listener data creates powerful opportunities for brands to connect with consumers.
Digital out-of-home (DOOH)
Perhaps the most striking example of data-driven innovation is the transformation of out-of-home advertising – traditionally the least targetable or measurable medium – into an emerging programmatic channel.
Mobile location data helps connect physical world movement patterns with digital screens in the real world, making billboards and place-based media more measurable and targetable than ever before.
Data enables several revolutionary capabilities in DOOH.
Audience-based buying
Audience-based buying allows advertisers to purchase DOOH impressions based on the audiences likely to see the displays rather than just the locations. A luxury retailer can target screens that index highly for affluent shoppers based on mobile movement patterns, regardless of neighborhood demographics.
Dynamic creative optimisation
Dynamic creative optimisation adjusts billboard content based on contextual factors like weather conditions, traffic patterns, time of day, or nearby events. A quick-service restaurant might show breakfast items during morning commute hours and switch to dinner promotions in the evening.
Cross-channel retargeting
Cross-channel retargeting identifies devices exposed to DOOH advertising and later retargets them with mobile or online messages. A concert promoter could show billboard ads near a music venue, then deliver mobile ads to devices that were in proximity to those billboards.
Attribution measurement
Attribution measurement connects DOOH exposure to subsequent store visits or online actions, solving the long-standing measurement challenge for outdoor advertising. A retailer can now quantify the lift in store visits among consumers exposed to their DOOH campaign compared to control groups.
These innovations demonstrate how data has transformed even the most traditional advertising channels, bringing programmatic capabilities to environments previously considered incompatible with precision targeting or measurement.
As data collection and identity resolution technologies continue to evolve, we can expect further innovation that bridges physical and digital advertising experiences in increasingly seamless ways.
AI in AdTech
Artificial intelligence (AI) is playing an increasingly central role in AdTech, enabling more personalised advertising experiences, predictive outcomes, and efficiency through automation.
At the heart of AI systems is data—massive volumes of behavioural, contextual, and transactional data that fuel the training of machine learning (ML) and AI models.
These models rely on high-quality, diverse datasets to learn patterns, forecast user behaviour, optimise ad delivery, and drive better outcomes across the advertising value chain.
However, using data for AI in AdTech presents both opportunities and challenges.
On the opportunity side, companies can harness AI to automate complex decisions at scale, uncover deep audience insights, and drive performance across media buying, creative optimisation, and targeting.
At the same time, they face challenges related to data quality, fragmentation across platforms, and the need for robust privacy compliance. With increasing regulations like GDPR, CCPA, and the deprecation of third-party cookies, companies must carefully balance innovation with ethical data use, transparency, and user consent.
Applications of AI in AdTech
AI is transforming nearly every aspect of the digital advertising workflow—from strategy and planning to execution and optimisation.
In media buying, AI powers bid optimisation, budget allocation, and audience segmentation in real time.
For creatives, AI enables dynamic creative optimisation (DCO), tailoring ad elements based on a combination of user profiles, behaviour, and context.
AI also plays a key role in fraud detection, predictive analytics, and lookalike modelling, helping advertisers find new users who resemble their high-value customers.
Below are some of the most impactful use cases:
01. Media planning
AI enhances media planning by analysing large datasets—historical campaign performance, user demographics, behavioural trends, seasonal fluctuations, and channel effectiveness—to help advertisers make better decisions about where, when, and how to spend their budgets.
The user interface of Spyrosoft AdTech’s AI Media Planner.
Traditional planning was manual and reactive, but AI can now forecast audience reach, recommend optimal media mixes, and even simulate budget outcomes.
- Example: Platforms like Skai and Albert AI use machine learning models to recommend spend allocations across channels (e.g., Google, Meta, Amazon) based on performance goals.
- Benefit: Faster planning cycles, more accurate forecasting, and data-backed media mix decisions.
02. Campaign creation
AI accelerates and enhances the campaign creation process by generating audience segments, suggesting targeting parameters, and even writing or designing creative assets.
Natural language processing (NLP) and computer vision can analyse product data, past creatives, or landing pages to auto-generate headlines, descriptions, and imagery tailored to specific users or platforms.
- Example: Tools like Meta Advantage+ and Google Performance Max use AI to dynamically assemble and test combinations of ad creatives and targeting criteria.
- Benefit: Saves time in campaign setup, improves testing velocity, and reduces reliance on large creative teams.
03. Bid optimisation
Real-time bidding (RTB) in programmatic advertising is a natural fit for AI. Machine learning models assess thousands of signals—user behaviour, time of day, device type, context, and historical conversion rates—to determine the optimal bid for each impression in milliseconds.
These models continuously learn and adjust based on outcomes, driving better return on ad spend (ROAS).
- Example: The Trade Desk’s Koa AI, Google Ads Smart Bidding, and Amazon DSP use AI to optimise bids in real time based on performance goals like CPA or ROAS.
- Benefit: Maximises efficiency and performance while reducing manual bid management.
04. Dynamic creative optimisation (DCO)
In DCO, AI selects and assembles the most relevant creative components (e.g., images, copy, CTA) for each user based on contextual signals, behavioural data, or audience profiles.
AI can also test variations and optimise based on which combinations drive the best outcomes.
- Example: Platforms like Celtra and Google Studio enable AI-driven creative assembly and performance optimisation. Smartly.io also uses AI to personalise creatives at scale.
- Benefit: Higher engagement rates, improved personalisation, and faster creative iteration.
05. Fraud Detection
AI is a crucial tool in combating ad fraud, which costs advertisers billions annually. By analysing vast amounts of traffic data in real time, AI models can detect anomalies that suggest invalid traffic (IVT), bots, click farms, or domain spoofing.
These models often use supervised and unsupervised learning to flag suspicious patterns that would be hard to detect manually.
- Example: DoubleVerify and IAS use AI to detect and block fraudulent impressions and traffic sources in real time.
- Benefit: Protects advertiser budgets, maintains inventory quality, and builds trust in the programmatic ecosystem.
Agentic AI in AdTech
Agentic AI refers to artificial intelligence systems that can act autonomously, pursue defined goals, and make decisions on behalf of users or organisations with minimal or no human intervention.
Unlike traditional AI models that are task-specific and reactive, agentic AI systems are proactive, adaptive, and capable of managing entire workflows—making them particularly transformative in the fast-paced world of AdTech.
In programmatic advertising, agentic AI can orchestrate a wide range of activities—from media planning to creative testing, budget optimisation, and even performance analysis—all while adapting dynamically to changing market conditions.
These agents operate based on strategic objectives (e.g., “maximise return on ad spend within X budget across Y platforms”) and can autonomously select which channels to use, how much to bid, which creative variants to deploy, and when to pivot strategy.
The rise of agentic AI represents an evolution in how campaigns are conceived, executed, and optimised.
It promises greater speed, scale, and efficiency, especially for brands managing complex, multi-platform campaigns.
However, it also raises important questions about accountability, brand safety, and ethical guardrails. To be effective, agentic AI systems must be trained on high-quality, representative data and operate within clearly defined boundaries to avoid unintended outcomes.
There have already been significant steps toward advancing the possibilities of agentic AI in programmatic advertising. The Ad Context Protocol (AdCP) and the IAB Tech Lab’s Agentic Real-Time Bidding Framework (ARTF) are two standards designed to enable interoperability between different agents for transacting and measuring digital media.
As AdTech continues to evolve, AI will be indispensable in making advertising more relevant, efficient, and privacy-conscious—so long as the underlying data strategies are equally advanced and responsible.
Summary
Data serves as the foundation for expanding programmatic capabilities beyond traditional digital display advertising into new formats and channels.
Organisations with sophisticated data strategies can leverage these emerging channels to reach consumers with relevant messages in previously untargetable environments, creating more integrated and effective cross-channel campaigns.
The future of programmatic innovation will likely continue to follow this pattern, with data and AI enabling new capabilities that transform existing media and create entirely new advertising opportunities.