Structuring the Unstructured: The Challenges of Screening Open Source Intelligence

KYC (Know Your Customer) screening is an essential process for businesses to identify and verify their clients’ identities as well as surfacing any potential risks associated with them. This process helps mitigate the risk of financial fraud, money laundering and other illegal activities by following KYC regulations. The information gathered during KYC screening can come from various sources, including both structured and unstructured data sets; processing these data sets presents several difficulties that businesses must overcome.

What is the importance of both structured and unstructured data for financial institutions?

Financial institutions rely heavily on data to make informed decisions, especially in KYC. Structured data, which is organised, easy to search, sort, and analyse, is commonly found in databases and spreadsheets, such as  watchlists in the case of KYC screening. Although this data is relatively easy to work with, it usually doesn’t hold the same level of information that unstructured data can and it certainly isn’t as up to date.

Unstructured data refers to information that does not have a predefined data model or format, such as news articles and blogs. Due to its lack of structure, this data contains a wider variety of information in various formats that can be used for many different purposes when analysing customers.

Although structured data is more manageable to process, unstructured data can provide valuable and unexpected insights to financial institutions that would otherwise be missed.

Why is there so much value in unstructured data?

Unstructured data typically has a faster accumulation rate, as it does not have to be ‘predefined.’ Financial institutions can analyse open source intelligence (OSINT), which ranges from highly structured government reports to unstructured mixed media, such as news articles. Seventy percent of OSINT is unstructured and this will increase to 80% by 2025. Despite the difficulties associated with processing unstructured OSINT data, organisations can gain a more comprehensive view of their clients and mitigate the risk of financial fraud, money laundering and other illegal activities with this data because there are other factors, such as context, that allow for further analysis.

Businesses can extrapolate useful facts about their customers from unstructured data, which can then be used to inform business decisions more accurately and mitigate risk effectively. 

Furthermore, in the case of structured watchlists, the majority of entities that end up on said watchlists are compiled from humans manually researching unstructured data sets like online news. For example, although there are official sanctions lists (an entity is either sanctioned or not), politically exposed persons (PEPs) can be somewhat subjective to determine if a person is a PEP or not. PEP classification is mainly derived from online media and not just official politician registers. Instead of using teams of humans to process all this unstructured data, technology can now step in.  

What are the challenges financial institutions face with structured and unstructured data?

Though easier to work with, one of the challenges in processing structured data sets is ensuring the accuracy and completeness of information, which is made difficult with errors, omissions and outdated information. This can result in incorrect or incomplete KYC profiles, which can lead to incorrect screening results and increased risk to the business. To mitigate this risk, businesses must implement strict data quality control measures and regularly update their structured data sets.

Unstructured data comes from a variety of sources, including news articles, resulting in the lack of a predefined data model or format. The overwhelming amount of unstructured data available and the various formats can be difficult to extract, process, and analyse, sometimes requiring manual intervention and specialised tools. On top of this, irrelevant or redundant information makes it more challenging to extract relevant information and combine data for KYC screening, in turn limiting the potential insights.

In addition, businesses must also address the privacy and security concerns associated with processing both structured and unstructured data sets. The information gathered during KYC screening is sensitive and personal, and businesses must ensure that the data is protected and kept confidential. This requires implementing robust security measures, such as encryption and secure data storage, and adhering to strict privacy regulations, such as the EU’s General Data Protection Regulation (GDPR).

Businesses must address these challenges to effectively screen their clients and mitigate the risk of financial fraud, money laundering and other illegal activities.

How can AI and machine learning help banks break down and leverage unstructured data?

Artificial Intelligence (AI), specifically Natural Language Processing, can be used to identify ‘facts’ about individuals and companies within large volumes of unstructured data. This allows for the automation KYC screening and enables businesses to make decisions more quickly and accurately. 

By leveraging AI and machine learning, businesses can also identify emerging and potential risks more effectively, ultimately leading to better decision-making and risk-management strategies.

smartKYC’s solution

To overcome these challenges, smartKYC has created an enterprise-grade automated KYC screening solution that combines a federated search with multilingual natural language processing and peerless name and identity matching, to help with KYC compliance. The platform can quickly and accurately analyse large volumes of both structured and unstructured data to identify potential risks and financial crime. This ensures all required sources are accurately searched, processed and harmonised to extract only pertinent facts for KYC screening purposes.

This is important because KYC compliance is a critical component of anti-money laundering (AML) and counter-terrorism financing (CTF) regulations. By using smartKYC’s solution, financial institutions can ensure they are meeting regulatory requirements while streamlining their KYC processes.

Discover smartKYC

smartKYC is a multi award winning risk profiling solution that uses artificial intelligence to fully automate Know Your Customer (KYC) screening processes on all required data sources, at every stage of the client lifecycle, be it pre-KYC, onboarding, periodic refresh, or continuous monitoring.

Combining a federated search with multilingual natural language processing, sophisticated name matching and unique identity matching, smartKYC is faster, better and more cost effective than any other solution on the market, book your demo today.