We Don’t Need More Data. We Need More Contextual Data: Data Enrichment vs Fraud

For every company that deals with the online world at any capacity these days, data is central to both its products and its process.

From trying to get ahead in offering customers and clients ample information about the benefits of your merchandise to reaching conclusions about the next best steps in your growth, or even to identifying those elusive gaps in the market, data is ever-present. But are we using it properly?

Time to Focus on Data Quality?

How much data is out there, precisely? Apparently about 74 zettabytes of data were created just in 2021, while 2020 was responsible for 59 zettabytes, and 2019, for 41 zettabytes of data – with each zettabyte being a trillion gigabytes. Surely, a lack of data is not a problem in our digital world. We don’t need more data; we need more and better ways to make use of that data; we need more contextual data.

One of the ways to achieve this is data enrichment. In simple terms, it could be perceived as an attempt to arrange and organize data in meaningful ways – and specifically, to increase the value of existing data by enriching it with relevant, hyper-specific information that will help utilize the former better.

Data enrichment finds fertile ground in various sectors, as we will see below, and it also comes in various forms, with a wealth of data enrichment tools and apps available as free as well as proprietary software to aid our efforts. These run the full gamut in terms of both purpose and capabilities, with some tailored to specific sectors, from lead generation to comprehensive technical data about websites – as well as fraud detection and prevention. In general terms, these all work by merging data from multiple sources, using a first-party database as a point of reference and enriching it in the process.

Data Is Being Underutilized

Indeed, some industry insiders estimate that as much as 95% of useful data is not being utilized by businesses, with more conservative calculations placing this at around 75%. Which is still a lot. And the main reason given for this? The lack of structure and meaningful connections available.

It should be noted that data enrichment is both a strategy and a process. Several data enrichment tools on the market today are part of a wider suite or platform rather than standalone, albeit still providing similar functionality. This makes practical sense, as data enrichment is largely based on automation and aims to both speed up processes and enable enhanced insights. By integrating it into wider automation solutions, this data is placed within an even larger framework, thus better contextualized.

The efficiency and usefulness of data enrichment depends on the specific purpose of the process. The better the quality and variety of data sources and the better built the enrichment tool, the better the quality of data it will return.

Data Enrichment for Fraud Prevention

One sector that has embraced data enrichment in recent years, utilizing it to enhance its capabilities, is the fight against fraud. Properly enriched, data about each user or transaction is collated, creating a user profile and assigning it a risk score. Such a score can be awarded transparently, alongside a full breakdown of the reasoning, in the case of whitebox solutions, or simply provided as a number with little explanation, when the solution is of the blackbox variety. From there, the enriched data can be used to trigger specific rules according to said score, or consulted by a fraud analyst who is conducting a manual review.

What does this mean in practice? Take an online shop that is equipped with a solid anti-fraud solution that includes data enrichment. A fraudster signs up for a new account, using a fake name, throwaway Gmail account and stolen credit card. They aim to use this stolen card to buy merchandise they will then resell for a profit. Upon entering their email address, this will be automatically used for a reverse email lookup, which will turn to OSINT sources in order to find out more about this person and enrich the user’s profile with this data. The data analyst will receive a full profile of the customer, which will include the fact that the email address is not connected to any online profiles or social media; it has not been in any known data leaks searchable on Have I Been Pwned; it was created using a free online service; there are no known pictures of the owner; etc. Moreover, there will be information about their IP address and geolocation; device and browser fingerprinting will generate caches to compare the person to others, or to previous sessions by the same account; and so on.

All this and more constitutes data used to enrich the account user’s profile. In turn, this allows either an automated system or a manual reviewer to make an informed decision about whether to allow this person to continue with their purchases or perhaps go through additional vetting or even get outright blacklisted.

Updating Algorithms and Sources

Ecommerce fraud is estimated to be costing companies over $20 billion in 2021 alone, so it is no wonder businesses invest in such solutions to protect their livelihood. But the anti-fraud implementation of data enrichment is not limited to online shops. Ticket and travel websites, fintech companies, banks, lenders, online gaming portals and many more use similar methods to stop fraudsters in their tracks. In the case of lenders, for example, a well-built and properly enriched applicant profile is extremely important to the underwriters’ decision making process, as lending money to criminals who will never pay it back can literally bankrupt a lender, especially so in the case of startups and new players.

Importantly, one benefit of a good data enrichment tool is that it is adaptable and scalable. For those using data enrichment for fraud prevention and mitigation, this means the ability to pick and choose which sources of data and methodologies applied work for each organization and which no longer do. Thus, the algorithms are fine-tuned to suit their individual needs. Moreover, useful new data sources are being added, which does not just mean additional data to consider, but additional data placed within context, in a manner that speeds up decisions and adds more certainty. In the case of the reverse email or phone lookups that we looked into above, this can take the form of having the system search not just on the 5-10 most common social media platforms but other services too. For instance, GitHub, TripAdvisor, WhatsApp and so on. The more data points consulted, the richer the profile, and the more reliable the conclusion to be reached.

In the anti-fraud industry, this is how data enrichment allows us to go from an abundance of data to an abundance of meaningful data, contextualised in a manner that makes them more useful than before.

About the Author:

Gergo Varga has been fighting online fraud since 2009 at various companies – even co-founding his own anti-fraud startup. He's the author of the Fraud Prevention Guide for Dummies – SEON Special edition. He currently works as the Senior Content Manager / Evangelist at SEON, using his industry knowledge to keep marketing sharp, communicating between the different departments to understand what's happening on the frontlines of fraud detection. He lives in Budapest, Hungary, and is an avid reader of philosophy and history.

We Don’t Need More Data. We Need More Contextual Data: Data Enrichment vs Fraud

Time to Focus on Data Quality?

Data Is Being Underutilized

Data Enrichment for Fraud Prevention

Updating Algorithms and Sources

You might like