Engineering Challenges

You’re here because we’re considering your profile for our team and want to see how you approach real problems.

This page explains how we approach engineering challenges at Veridion and what we look for when reviewing candidate submissions. Before starting a challenge, it helps to understand the context in which these problems exist and how we evaluate solutions.

The challenges themselves are designed to simulate the kind of open-ended work we deal with every day. They give you the opportunity to show how you investigate a problem, make decisions, and build something that works in a real-world setting.

If you join Veridion, the effort invested throughout the recruitment process will be rewarded with a bonus added to your 4th salary (after the 3 months probation period).

The way we work

The core principles behind our approach

Understand the problem deeply

Strong solutions usually come from careful investigation of the problem and the data behind it before writing code.

Choose tools based on the problem

The goal is not to showcase a particular tech stack, but to solve the problem as effectively as possible.

Question default approaches

When no established solution exists, don’t assume one. Explore the problem from multiple angles and be willing to define the approach yourself.

Deliver real-world value

Real systems require balancing accuracy, speed, and effort to deliver value to those who rely on them.

Learn and iterate fast

Expect uncertainty and experimentation. Progress comes from testing ideas, learning quickly, and continuously improving the solution.

What makes a great solution

The hardest part is not writing the code. Most solutions fail much earlier.

We review these projects as a signal of the impact you could have if you joined our team.

In practice, that means looking at how you investigate the problem and the data behind it, how you define the actual problem to solve, what decisions and trade-offs you make along the way, how you use tools, and how you turn the idea into a working solution.

Treat it as a small production-ready project rather than a quick exercise. Submissions that consist of a minimal script, a mostly LLM-generated solution, or a solution where the problem is forced to fit a tool rarely demonstrate the depth we’re looking for. We review a large number of projects and are familiar with the patterns these approaches tend to produce.

Problem framing

Did you correctly identify and define the actual problem to solve?

Engineering decisions

Are the key decisions and trade-offs thoughtful and clearly justified?

Use of tools

Are tools used to support the solution, or does the solution appear shaped around the tool instead of the problem?

Execution quality

Does the solution work correctly, respect the requirements and constraints, and cover the relevant cases?

Ownership

Does the solution demonstrate clear ownership of the reasoning, decisions, and implementation, even if AI or other tools were used?

Choose your track

Check out the challenges below and choose the one that you think you’ll do your best at. They are similar to the type of tasks you would work on once you join our team.

Your submission should reflect how you approach real engineering problems.

#1 Website Technologies Scraper (SW Engineer Intern)

Task

A company develops an analytics plugin for Shopify stores. Their salespeople want to find websites that are using competing products in order to create a marketing campaign to convince the owners to migrate to their solution.

Your goal is to build a tool that identifies technologies used in building a website.

Things to consider

Gather as many technologies as possible.
Provide proof about how you came to the conclusion that the website is using X technology.
From a tech stack perspective, you can use any programming language, toolset or libraries you’re comfortable with or find necessary, especially if you know it would be a better option or a more interesting one (we generally prefer Node, Python, Scala).

Debate topics

You do not need to implement these, just write a few thoughts in the README.md:

what were the main issues with your current implementation and how would you tackle them?
how would you scale this solution for millions of domains crawled in a timely manner (1-2 months)?
how would you discover new technologies in the future?

Ready?

Here’s a file which contains 200 different domains. We identified 477 different technologies on them. How many can your tool find?

domain list

Expected Deliverables

Solution explanation / presentation
Provide an explanation or presentation of your solution and results. You have total creative freedom here. Feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.
Output
A file containing identified technologies for each of the input domains (JSON, CSV, Parquet, etc).
Code and Logic
Include the code that enabled you to achieve this task for the provided list, along with answer to the debate topics.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

#2 Intent Qualification (ML Engineer Intern)

Task

Your mission is to build a ranking and qualification system that determines whether a company truly matches a user’s request.

Imagine a user asks: “Find logistics companies in Germany.”

A search system retrieves hundreds of companies that might be relevant. But search results are noisy. Among them you might find:

A freight forwarding company in Hamburg (perfect match)
A German software company that builds logistics management tools (debatable)
A Polish company operating a warehouse near the German border (probably not)

Search retrieves candidates, but something still needs to decide which companies actually match the user’s intent.

A naive approach would be to send every candidate company to a large language model and ask: “Does this company match the query?”

This works surprisingly well, but it has serious problems:

Expensive — qualifying hundreds of companies per query quickly becomes costly.
Slow — sequential API calls can take tens of seconds.
Inconsistent — borderline cases may produce different answers across runs.
Overkill — simple queries receive the same expensive treatment as complex reasoning tasks.

Your challenge is to design a smarter qualification system that balances:

Accuracy
Speed
Cost
Scalability

The strongest solutions will combine multiple techniques and apply them intelligently.

1. Data

You will receive a dataset containing a collection of company profiles. Each company represents a potential candidate that may or may not satisfy a given query.

Each company may include some or all of the following fields:

website – The company’s primary website domain.
operational_name – The commonly used name of the company.
year_founded – The year the company was established.
address – The company’s primary location
employee_count – Estimated number of employees.
revenue – Estimated annual revenue in USD.
primary_naics – The company’s main NAICS industry classification.
secondary_naics – Additional NAICS industry classifications where applicable.
description – A textual description of the company’s activities, products, or services.
business_model – The primary business model (e.g., B2B, B2C, marketplace, SaaS, etc.).
target_markets – Industries or customer segments the company serves.
core_offerings – Key products or services provided by the company.
is_public – Indicates whether the company is publicly traded.

Example company record:
{
"operational_name": "Meridian Logistics GmbH",
"website": "meridian-logistics.de",
"year_founded": 2003,
"address": "Munich, Germany",
"employee_count": 342,
"revenue": 48000000,
"primary_naics": {"code": "488510", "label": "Freight Transportation Arrangement"},
"secondary_naics": [{"code": "493110", "label": "General Warehousing and Storage"}],
"description": "Full-service freight forwarding and supply chain management company offering customs brokerage, warehousing, and transportation solutions across Europe.",
"business_model": ["B2B", "Service Provider"],
"core_offerings": ["freight forwarding", "customs brokerage", "warehousing"],
"target_markets": ["automotive", "manufacturing"],
"is_public": false
}

Important: not every company contains every field.
Missing data is common in real-world company datasets, so your solution should remain effective even when some information is unavailable.

2. Objective

Build a system that:

receives a user query
has access to the companies database

and determines which companies truly match the query.

Your solution should return a ranked or filtered list of companies that best satisfy the user’s intent.

The goal is not simply to find companies that are *similar* to the query — but companies that meaningfully satisfy the constraints implied by it.

3. Queries

Your system will be tested on 12 queries of varying complexity.

Some queries are highly structured and map directly to specific fields.

Example:

“Public software companies with more than 1,000 employees.”

Others require interpretation and reasoning.

Example:

“Fast-growing fintech companies competing with traditional banks in Europe.”

Some queries may involve:

supply chains
business relationships
inferred industry roles
vague or subjective criteria

Your system should attempt to handle both structured and judgment-heavy queries.

Example Queries

Logistic companies in Romania

Public software companies with more than 1,000 employees.

Food and beverage manufacturers in France

Companies that could supply packaging materials for a direct-to-consumer cosmetics brand

Construction companies in the United States with revenue over $50 million

Pharmaceutical companies in Switzerland

B2B SaaS companies providing HR solutions in Europe

Clean energy startups founded after 2018 with fewer than 200 employees

Fast-growing fintech companies competing with traditional banks in Europe.

E-commerce companies using Shopify or similar platforms

Renewable energy equipment manufacturers in Scandinavia

Companies that manufacture or supply critical components for electric vehicle battery production

Look carefully at the differences between these queries.

Some are almost entirely structured filters.

Others require interpreting the role a company plays within a broader ecosystem.

Such a problem requires a flexible system that can adapt to the complexity of the query.

4. Baselines

You may find the following baseline strategies tempting. Each works partially, but each has significant limitations.

BASELINE A — LLM Per Company

Send each company individually to an LLM and ask: “Does this company match the query?”

Pros:

strong semantic understanding
decent accuracy

Cons:

expensive
slow
inconsistent
scales poorly

BASELINE B — Embedding Similarity

Embed the query and each company profile and rank by cosine similarity.

Pros:

cheap
fast

Cons:

poor intent understanding
similarity ≠ relevance

Example failure:

Query: “Companies supplying packaging for cosmetics brands”

Embedding search often ranks cosmetics companies instead of packaging suppliers.

Your Goal

Design a system that combines the strengths of these approaches while avoiding their weaknesses.

Your solution should aim to be:

More accurate than naive similarity search
Faster and cheaper than sending every company to an LLM
Scalable to large datasets

Expected Deliverables

Implementation
Provide a working solution that processes the set of queries
Your solution should produce qualified companies for each query.
You are free to design the architecture however you see fit.
Code
Submit the code implementing your approach.
Your code should demonstrate:
- clear structure
- modular design
- scalability considerations
Organise your submission similar to the following:
your-submission/
├── solution.py
├── WRITEUP.md
├── requirements.txt
└── any supporting files
Writeup

Alongside your implementation, submit a WRITEUP.md explaining your solution.

We care as much about the way you think as the final results.

Your writeup should address the following:

3.1 Approach

Describe your system architecture.

What components does it include?
How do they interact?
Why did you choose this design?

3.2 Tradeoffs

What did you optimize for?

Examples:

speed
cost
accuracy
simplicity
robustness

What trade-offs did you intentionally make?

3.3 Error Analysis

Where does your system struggle?

Show concrete examples of companies it misclassifies and explain why.

3.4 Scaling

If the system needed to handle 100,000 companies per query instead of 500, what would you change?

3.5 Failure Modes

When might your system produce confident but incorrect results?

What would you monitor in production to detect these failures?

Critical Thinking

The strongest submissions show deep reflection about the problem and solution.

Ask yourself questions such as:

Where does my system work extremely well?
Where does it fail?
What assumptions did I make?
How robust is the system to missing data?
How well would this scale to millions of companies?
What improvements would I prioritise next?
What signals does the system rely on most heavily?
When might those signals be misleading?

Understanding the limits of your approach is as important as demonstrating its strengths.

Resources

If you’re ready to begin, start with the following: companies.jsonl

import pandas as pd
df = pd.read_json("data/companies.jsonl", lines=True)

We’re excited to see the solutions you come up with.

Focus on building a system that is thoughtful, scalable, and well-reasoned.

When you’re finished, please submit your solution as a GitHub repository.

#3 PoC Simulation (Data Analyst Intern)

Role outline

At Veridion, we are on a mission to shape the future of data-driven solutions. Way too many opportunities slip through the cracks because the right people don’t have the right info at the right time. We’ve seen businesses go down during global shake-ups and watched great ideas stall because teams are stuck wrestling with messy, outdated data. We’re here to make data fast, smart, and actually useful.

A little bit of everything

PreSales is where you juggle flaming swords while riding a unicycle… and somehow enjoy it. You’re right at the crossroads of three powerful worlds: tech, product, and client-facing.
Your mission is to absorb the best of each, connect the dots, and craft killer POCs that truly showcase what our data can do.
You’ve got to be a data analysis pro, not just spotting trends and patterns, but also catching the sneaky details that can make or break a POC.
You don’t need to be a full-time dev, but rolling up your sleeves for some coding is key.
All of this with one goal in mind: linking a client’s pain point to how our data can actually make their lives easier and their decisions smarter.

What is a POC at Veridion?

A Proof-of-Concept is a small-scale project used to demonstrate that a solution is feasible and effective in solving a specific problem before a client commits. At Veridion, a POC usually means delivering a dataset of 1k–10k companies and showing exactly how it can drive value in the prospect’s day-to-day operations. Sounds simple? It’s not.

Every sample is different, our data evolves weekly, each client has unique challenges, and we’re constantly crafting creative solutions or prototyping features that are still in development, or haven’t even been built yet. Your mandate is to drive things from point A to point B and fully own the process and its outcome.

Cross-Functional Collaboration

You’ll work closely with Veridion’s internal teams, including Technical, and Customer Success, to ensure seamless delivery of solutions that meet customer needs. Your collaborative spirit will help ensure alignment across departments, contributing to the overall success of the business.

Your challenge

Part 1: POC simulation

A large manufacturing company’s Procurement department is kicking off a digitalization journey. Their category managers have hit a wall – they can’t properly analyze spend because their supplier database is cluttered with messy, duplicate, and outdated entries. Meanwhile, leadership is pushing hard for a clear cost-saving strategy for next year. On top of that, there’s interest in exploring sustainability in the supply chain, but they just don’t have the resources to prioritize it right now.

They’re currently piloting solutions with two competitors: a well-known legacy provider and a newcomer. While they’re fairly satisfied with the newcomer’s performance, the legacy player’s strong market reputation and proven value still carry weight. Budget is already allocated, and they’re set to make a decision next quarter.

1. Entity Resolution

You’ve received a sample of companies from the client for the POC. Each entry has been processed through our entity resolution engine, returning up to 5 candidate matches per row. Your task is to select the best match for each input. If none are accurate, you can leave it unmatched or find the correct entity elsewhere. The goal is to resolve every row to a real-world company—if it exists online, you should be able to find it.

2. Data analysis and QC

Once you’ve picked the correct matches, review the data attributes we provide and look for any inconsistencies. You don’t need to fix them (though you’re free to if you want 😄), but you should think through how to curate the dataset so it’s clean and ready for the client.

3. Summarize your work

Walk us through everything you did and observed during the project. We care more about understanding your thinking process than the specific tools you used to get there.

4. Publish your work

Choose the format that you think best fits this type of challenge / use case and publish it. While solutions to all other challenges are expected to be published on github, you can use the github field in the form below to link your work, even if it leads to a different domain other than github.

5. Submit the challenge

Submit your challenge in the form below.

Next steps if you pass this round

Part 2: Roleplay Prep

You’ll need to put together a presentation for the client that highlights our value proposition and clearly walks them through the POC results.

Part 3: In-Person interview

You’ll have 30 minutes during the in-person interview to present your deck – we’ll play the role of the client. After that, there’ll be a 1-hour live task to give you a better feel for the kind of work a PreSales Data Specialist handles.

Resources & broader background

Market context and why we exist
Docs and features
Current Data Dictionary
Search API – video tutorial | walkthrough
Explore the Veridion Universe – metrics, distribution, fill rates, test the API (you will have to create a free account)
One pagers:

At Veridion, we appreciate talent, skill, and a commitment to excellence. We offer and expect a high level of honesty and integrity throughout our professional relationships. If you’re passionate about data, innovative technology, and making a real impact, we look forward to welcoming you to our team.

Download a Veridion data sample

Choose your sample

Generic

Procurement

Insurance

ESG

Market Intelligence

Download a Veridion generic sample

Your data sample is on its way!

Generic

Choose ->

Download a Veridion procurement sample

Check your email for more details. You will be redirected soon...

Supplier Sourcing

Choose ->

Supplier Risk Monitoring

Choose ->

Supplier Enrichment

Choose ->

Download a Veridion insurance sample

Check your email for more details. You will be redirected soon...

Book Management

Choose ->

Pre-fill

Choose ->

Quote to Bind

Choose ->

Download a Veridion ESG sample

Check your email for more details. You will be redirected soon...

ESG

Choose ->

Download a Veridion Market Intelligence sample

Check your email for more details. You will be redirected soon...

Market Intelligence

Choose ->

Get a 100% custom sample from Veridion

Or

Get a 100% custom sample,

tailored to your specific needs

Our clients face a wide variety of data problems before engaging with us. Some focus on classifications, some on locations, business activity or so on. We're happy to tailor our sample so you can paint a picture of fit.

Stay in the loop

By signing up, you agree to the privacy policy

Verticals

Products

Where to buy

(listed as Soleadify)

What Are the Challenges of Supply Chain Mapping?

Learn

The Company

Engineering Challenges

You’re here because we’re considering your profile for our team and want to see how you approach real problems.

The way we work

The core principles behind our approach

Understand the problem deeply

Choose tools based on the problem

Question default approaches

Deliver real-world value

Learn and iterate fast

What makes a great solution

The hardest part is not writing the code. Most solutions fail much earlier.

Problem framing

Engineering decisions

Use of tools

Execution quality

Ownership

Choose your track

Task

Things to consider

Debate topics

Ready?

Expected Deliverables

Submit your project

Task

1. Data

2. Objective

3. Queries

Example Queries

4. Baselines

Your Goal

Expected Deliverables