The new paradigm for B2B data - Commercial Insurance Perspective - Veridion

The new paradigm for B2B data – Commercial Insurance Perspective

By: Alin - 25 November 2022

For the first time in history, we can directly trace massive problems to bad data and, at the same time, have the ability and the right tools to build systems that address these problems for the long run: high-quality data as infrastructure.- Florin Tufan, CEO @Soleadify


The way companies use data for making decisions is going through a massive paradigm shift that brings in a new wave of data solutions built by challenger start-ups that compete with multi-billion-dollar legacy players.


The aforementioned legacy players are currently stuck in the following scenario: Data used to be something humans would look at, analyze and make decisions. That required data to be simple enough for humans to read and interpret.


This is changing. Access to tools (e.g., Snowflake, Databricks etc.) and the advance of data science enable the transition of decisions through machine interpretation and automated processes, with data at their core. Data is now becoming part of the tech stack, deeply embedded in machine-powered decision engines across all business processes.


Billion-dollar data companies were built – and are still thriving – in the old paradigm. But they fail to deliver to modern requirements, so end users need to look elsewhere for better solutions. This is why alternative data (read “data from unofficial sources”) is growing at 50% CAGR YoY.


Remarkable examples of data being used at scale by leveraging big data solutions can be observed across multiple verticals.


🗺️Maps used to be something you looked at. Now, navigation apps crunch data and provide the fastest route. No human is looking at data.



🎯Ad targeting used to be based on people making assumptions of interests based on location, age, gender, and income level. Now, we have entirely automated ad-targeting solutions that analyze thousands of data points to match ads with eyeballs. No human is looking at age groups and assigning ads. Most advertisers have yet to learn which user traits make a user likely to engage with their ad.



How this paradigm shift looks like for B2B in Commercial Insurance


If you work in insurance, you probably know that collecting data to plug into the actuarial models and price each policy takes time, money, and human capital.

  • Forms with tens of questions sent to the client


A glimpse of the past. Soleadify can already source data to pre-fill answers for most of these questions.

  • Call centers that call prospects to ask for additional details
  • Underwriters that use a platform provided by legacy players and still spend ~40% of their time doing research (Googling). Source

By taking this route a lot of insurers end up with terrible results

  • You only know what you know: the most significant risks remain uncaptured by the heuristic-based questionnaire, leading to operational losses in several business lines (Source)
  • Human judgment is non-deterministic: Up to 300% differences in decisions made by different people under the same circumstances (Source)
  • Errors have a massive cost: $6.5B in yearly lost premiums from misclassifying small businesses in the US alone. (Source)


The paradigm shift: Manual pricing and underwriting are expected to be obsolete within the next decade. (Source)

In insurance, the shift towards using data as infrastructure for automation is already very clear:

  • Data and analytics spend in insurance growing at CAGR of 14.4% (source)
  • Insuretech platforms are competing to pre-fill forms and help automate pricing, for new policies and renewals alike.
  • Insuretechs with a strong analytics suite help carriers build models and better predict risk at scale.
  • Large insurers are building similar solutions in-house. One of the main providers for data and analytics solutions in the insurance space actually considers the in-house solutions built by large carriers to be their main competition.


This new paradigm requires a new breed of data products ****

Using data to automate insights, predictions, and decisions with minimal – if any – human supervision requires a new breed of data products delivered through powerful APIs that can be relied upon when plugged into processes and machine learning solutions, making or influencing trillions of dollars worth of decisions.

In this emerging world, detailed information about companies is highly relevant and needs to flow, as close to real-time, into all sorts of decision engines.

A restaurant’s cuisine or their policy on deliveries, whether a coffee shop sells CBD oils, the acquisition of a computer vision startup by an automotive corporation, a semiconductor factory’s product offering, a grain supplier’s warehouse locations, a plastic manufacturer’s certifications for its product lineup – these are all information that can heavily influence both individual decisions and sector/industry reports.


Traits of new age data provider

 ⏩Data updated in real time, instead of year-old data. Global coverage.

  • 15% of businesses are born every year. Another 15% of businesses go through a material change in their activity every year. And without recent data, you don’t know which, let alone how they changed. Stale data leads to poor decisions. Recency is key.
    • Automated decision engines can’t rely on people to do research and get information updated – the freshness has to be baked into the data source.


⏩Ever-growing depth, instead of shallow profiles.

    • Because of lack of data, the world got used to heuristics based on the assumptions that 2-3 data points are the key factors in a decision.
    • The lack of a provider of accurate depth is the reason why large companies lose 40% of their yearly earnings every decade, due to an unexpected event – a risk they weren’t aware of.
    • For historical and predictive analysis to be able to surface previously hidden correlations, today’s data providers don’t only need depth, but the ability to ingest, assess and merge new data as it becomes available so the depth constantly increases.
    • Even more, some highly relevant data doesn’t exist so it needs to be inferred with the same high regard for accuracy.


⏩Extensive control over data transformation, instead of choices made for you

    • For every decision, the role and weight of each criteria is far from standardised. Definitions differ from team to team, and the most sophisticated data teams require to have full control over choices and definitions. The interpretation and use of data is where companies gain a competitive edge.
    • For example, let’s assume a company’s website listed “25 team members in Romania, 2 in Canada and 1 in USA”. If you built a deal-sourcing tool for VCs, you could choose to train your models to look for “companies above 25 employees” OR “companies above 25 employees with some of them in North America”. The data provider’s ability to give you control over how you choose to look at employee counts returns wildly different results. Currently, data providers would only give you the employee count, because that’s all they know (the registry only gives you a number, and no context behind it).


⏩Nuance, context and confidence scores around every data point instead of a simple value 

Technical teams, and data science teams above all, need full control over choices and tradeoffs (e.g. from control over the tradeoff between accuracy and coverage, to requesting results using a specific taxonomy over another). Without the context crunching that comes from aggregating tens of millions of sources, it is nearly impossible for data vendors to really offer the control their clients demand.


    • Instead of snapshots, most use-cases require some view over the evolution in time of a company.


💡 Consider the following two scenarios:

a. We’re evaluating a supplier located and incorporated in the US

b. We’re evaluating the same supplier, located and incorporated in the US, except that we now know they used to only manufacture in Russia until 12 months ago, and even more, they used to only operate in Russia 36 months ago

                              • The longitudinal view radically changes the analysis, although the snapshot is identical.
                              • The longitudinal view is one of the many reasons why data companies have a built-in moat over time.


⏩Powerful APIs to enable easy consumption across many complex use cases

  • In today’s world, acquiring the data is barely the beginning of the difficult part.
  • Data/technical/product teams need to normalize, taxonomize, harmonize with existing data, need to do entity resolution, and build complex logic on top.

a.  Provider A returns a list of products manufactured by 4M manufacturers in the world. In this case, the client now needs to embark on a 12mo+ journey to build a search engine on top.

b.  Provider B exposes an API that gets as input a product name and returns a list of relevant manufacturers. The search engine is the API.

The results, in option (b), are much more accurate than the client would ever manage to get, because:

                                  • the data supplier leverages a lot more data and additional context when building the search solution (e.g. Soleadify, for the supplier search API, takes into consideration how prevalent the product is within the entire portfolio of the manufacturer – i.e. is this one of the main expertises of this manufacturer)
                                  • the data supplier has similar search implementations with other clients, generating feedback loops that improve the accuracy of the results constantly over time.


And that’s why we founded Soleadify.

The current market leaders in B2B data act most of the time as a middleman between users and government provided data. The major flaw for this type of approach is that government records don’t represent the “truth” for what a business actually does, and their main purpose is regulatory compliance. It’s great for (parts of) AML and KYC but virtually useless for anything else.


We created the first alternative to the age-old source of truth in B2B data (government records), by turning the vast, unstructured content on the web into a dynamic data set on private and public companies.


Our main approach for tackling this challenge was to build a data factory that transforms real world business activity into a “source of truth” data set on organisations. This data (our product) is exposed through various APIs (our packaging) that cater to different stages of a wide array of B2B processes.


How does the “data factory” work?

The stepping stone of this factory is a Google-like approach to sourcing data – making sense of the unstructured content on the web.

                                1. We scan the entire web and identify any piece of content that may be speaking about a company. To do this, we process billions of unstructured content sources every day (company websites, social media profiles, news article, press releases, local association websites, etc).
                                2. We pass all these data points (”candidates”) through our entity resolution engine that brings together pieces of data around an entity (”is this an address? Is it the address of a company? Is it the headquarters of a company? which company?”) and then disambiguates between entities (”is Apple US the same as Apple Ireland? How about Bob’s Plumbing in London vs Bob’s Plumbing in Vancouver?”).
                                3. The third component is our triangulation engine that does two things:

                      a. it infers missing data by making correlations between different                                signals and looking for similar companies across our data set

                       b. decides the truth between conflicting sources

The value of data is judged by coverage (how much of the universe it covers), depth (how much detail), veracity (how true and how recent) and usefulness (what problems can I solve with it). We look at what makes us different through these 4 lenses.



We can truly say that we have a global covarage with a multilingual data sourcing infrastructure that supports 20 languages. We currently track 70M companies worldwide. With higher covarage for regions like North America, Europe, Australia and New Zealand. For now one of the main points to imrove on is expanding covarage accounting for distinct cultural differences that influence content processing. (A good example is China, that is currently only partially covered by our global models). Full coverage here



For now the actual depth of our data can be measured by the 50+ data attributes that we track for all the 70 million company profiles. “For now” is the term we use here, because they are constantly evolving and as we discover new sources, and new ways to add data points, and improve accuracy for existing ones this figure is subject to constant changes. One good example for this type of volatility is how we first time extracted our product data, through the span of 1 mouth we managed to extract 300M products from 4M manufacturers across the world. For a better understanding of our data points you can acces our Data Dictionary.



In the space of data science, “veracity” refers to how accurate or truthful a data set may be. To achieve a level of high veracity for our data we enforce three fundamental principles.

    1. Keeping data fresh – The data is refreshed every week and models are re-trained automatically every 6 months (to deal with concept and model drifts).
    2. Data correcting data – For better veracity we are constantly running contextual, multi-signal analysis for each data point. Each signal results in an opinion with a confidence score based on multiple factors. Our technology decides on which opinion is true automatically, with accuracy on par or beyond with what a human would conclude with the same dat
    3. Supercharged feedback loops –  The way our data is consumed makes our clients proactively interested in correcting our data. Our data is delivered through APIs in production environments without human oversight, which makes our data accuracy a common goal for us, our clients and our partners.



We look at the usefulness of our solutions through three different perspectives:

    1. The size and urgency of the problems our data solutions can solve
    2. Why our data is able to solve these problems – a combination of data quality outlined above and powerful APIs outlined below
    3. How easy it is to use our data solutions


Let’s discover together how your company can embrace this new paradigm

Acces to in depth B2B web captured data can be the stepping stone for the next generation of commercial insurers. By leveraging this type of data, insurance companies can build seamless experiences for commercial lines and serve customers at scale with minimum risks.

Building the framework for this kind of end-to-end experiences is quite a difficult challenge. But in the long run it will make the difference between the players that lead the market and the ones that struggle to survive.

Let’s discover together how Soleadify’s data can help your company streamline commercial insurance underwriting.


Subscribe to our newsletter

Stay up to date and get valuable insights on our latest tech updates, business data reports, case studies, and data related content on Insurance, Supply Chain, and Market Intelligence.
Subscribe here.