- Inicio
- Blog
- Data strategy
- Identity resolution: Probabilistic vs. deterministic approach
Identity resolution: Probabilistic vs. deterministic approach
For marketing, identity resolution is the process by which a unique identity is assigned to a person, device or account. From there, you can determine how often and in what contexts that identity is present.
Historically, identity resolution has been based on a deterministic approach, in which it is assumed that an ID is unique and cannot be shared. However, in recent years a new approach has emerged, known as the probabilistic approach, which assumes that IDs can be shared and that the probability that an ID belongs to a particular person is what matters.
So what is the best way to approach identity resolution? To answer this question, we must first understand how the deterministic and probabilistic approaches work.
You may also be interested in: Why move from a contact database to an identity graph?
Deterministic approach
The deterministic approach is based on a set of predefined rules for assigning IDs. For example, we could assume that all Facebook users have a unique and unrepeatable ID. From there, we can use the Facebook ID to link user data between different platforms and channels.
The problem with the deterministic approach is that it has a significant margin of error. For example, if a user changes their Facebook ID, we will no longer be able to link their data. There are also people who use multiple IDs for different purposes, which makes it more difficult to assign them a unique ID. In general, the deterministic approach is less accurate than the probabilistic approach.
Probabilistic approach
Instead of using predefined rules to assign IDs, the probabilistic approach relies on artificial intelligence and machine learning to infer the probabilities that an ID belongs to a given person.
The probabilistic approach is more accurate than the deterministic approach because it is not based on predefined rules; however, it also has its own problems. For example, training the model can be expensive and require a lot of data. It can also be difficult to assess the accuracy of the model if we do not have an independent test data set. In general, the probabilistic approach is more accurate than the deterministic approach, but it also requires more work and analysis to implement it correctly.
Which one should I use?
Although the probabilistic approach may be more efficient in some cases, it is not always the best option. In certain situations, such as when sensitive or private data is involved, it is important that the identity resolution be 100% deterministic.
In general, the best approach to identity resolution depends on the type of data and the specific needs of each company. However, the probabilistic approach can be a useful tool for many companies that need to match large amounts of data.
More articles
- Differences and similarities of marketing data platforms; CDP, CRM and DMP