Engineering Challenges - Veridion

Engineering Challenges

You’re here because we’re considering your profile for our team and want to see how you approach real problems.

This page explains how we approach engineering challenges at Veridion and what we look for when reviewing candidate submissions. Before starting a challenge, it helps to understand the context in which these problems exist and how we evaluate solutions.

The challenges themselves are designed to simulate the kind of open-ended work we deal with every day. They give you the opportunity to show how you investigate a problem, make decisions, and build something that works in a real-world setting.

If you join Veridion, the effort invested throughout the recruitment process will be rewarded with a bonus added to your 4th salary (after the 3 months probation period).

The way we work

The core principles behind our approach

Understand the problem deeply

Strong solutions usually come from careful investigation of the problem and the data behind it before writing code.

Choose tools based on the problem

The goal is not to showcase a particular tech stack, but to solve the problem as effectively as possible.

Question default approaches

When no established solution exists, don’t assume one. Explore the problem from multiple angles and be willing to define the approach yourself.

Deliver real-world value

Real systems require balancing accuracy, speed, and effort to deliver value to those who rely on them.


Learn and iterate fast

Expect uncertainty and experimentation. Progress comes from testing ideas, learning quickly, and continuously improving the solution.

What makes a great solution

The hardest part is not writing the code. Most solutions fail much earlier.

We review these projects as a signal of the impact you could have if you joined our team.

In practice, that means looking at how you investigate the problem and the data behind it, how you define the actual problem to solve, what decisions and trade-offs you make along the way, how you use tools, and how you turn the idea into a working solution.

Treat it as a small production-ready project rather than a quick exercise. Submissions that consist of a minimal script, a mostly LLM-generated solution, or a solution where the problem is forced to fit a tool rarely demonstrate the depth we’re looking for. We review a large number of projects and are familiar with the patterns these approaches tend to produce.

Problem framing

Did you correctly identify and define the actual problem to solve?

Engineering decisions

Are the key decisions and trade-offs thoughtful and clearly justified?

Use of tools

Are tools used to support the solution, or does the solution appear shaped around the tool instead of the problem?

Execution quality

Does the solution work correctly, respect the requirements and constraints, and cover the relevant cases?

Ownership

Does the solution demonstrate clear ownership of the reasoning, decisions, and implementation, even if AI or other tools were used?

Choose your track

Check out the challenges below and choose the one that you think you’ll do your best at. They are similar to the type of tasks you would work on once you join our team.

Your submission should reflect how you approach real engineering problems.

Task

A company develops an analytics plugin for Shopify stores. Their salespeople want to find websites that are using competing products in order to create a marketing campaign to convince the owners to migrate to their solution.

Your goal is to build a tool that identifies technologies used in building a website.

Things to consider

  • Gather as many technologies as possible.
  • Provide proof about how you came to the conclusion that the website is using X technology.
  • From a tech stack perspective, you can use any programming language, toolset or libraries you’re comfortable with or find necessary, especially if you know it would be a better option or a more interesting one (we generally prefer Node, Python, Scala).

Debate topics

You do not need to implement these, just write a few thoughts in the README.md:

  • what were the main issues with your current implementation and how would you tackle them?
  • how would you scale this solution for millions of domains crawled in a timely manner (1-2 months)?
  • how would you discover new technologies in the future?

Ready?

Here’s a file which contains 200 different domains. We identified 477 different technologies on them. How many can your tool find?

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here. Feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Output
    A file containing identified technologies for each of the input domains (JSON, CSV, Parquet, etc).
  3. Code and Logic
    Include the code that enabled you to achieve this task for the provided list, along with answer to the debate topics.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Task

Your mission is to build a ranking and qualification system that determines whether a company truly matches a user’s request.

Imagine a user asks: “Find logistics companies in Germany.”

A search system retrieves hundreds of companies that might be relevant. But search results are noisy. Among them you might find:

  • A freight forwarding company in Hamburg (perfect match)
  • A German software company that builds logistics management tools (debatable)
  • A Polish company operating a warehouse near the German border (probably not)

Search retrieves candidates, but something still needs to decide which companies actually match the user’s intent.

A naive approach would be to send every candidate company to a large language model and ask: “Does this company match the query?”

This works surprisingly well, but it has serious problems:

  • Expensive — qualifying hundreds of companies per query quickly becomes costly.
  • Slow — sequential API calls can take tens of seconds.
  • Inconsistent — borderline cases may produce different answers across runs.
  • Overkill — simple queries receive the same expensive treatment as complex reasoning tasks.

Your challenge is to design a smarter qualification system that balances:

  • Accuracy
  • Speed
  • Cost
  • Scalability

The strongest solutions will combine multiple techniques and apply them intelligently.


 

1. Data

You will receive a dataset containing a collection of company profiles. Each company represents a potential candidate that may or may not satisfy a given query.

Each company may include some or all of the following fields:

  • website – The company’s primary website domain.
  • operational_name – The commonly used name of the company.
  • year_founded – The year the company was established.
  • address – The company’s primary location
  • employee_count – Estimated number of employees.
  • revenue – Estimated annual revenue in USD.
  • primary_naics – The company’s main NAICS industry classification.
  • secondary_naics – Additional NAICS industry classifications where applicable.
  • description – A textual description of the company’s activities, products, or services.
  • business_model – The primary business model (e.g., B2B, B2C, marketplace, SaaS, etc.).
  • target_markets – Industries or customer segments the company serves.
  • core_offerings – Key products or services provided by the company.
  • is_public – Indicates whether the company is publicly traded.
Example company record:
{
"operational_name": "Meridian Logistics GmbH",
"website": "meridian-logistics.de",
"year_founded": 2003,
"address": "Munich, Germany",
"employee_count": 342,
"revenue": 48000000,
"primary_naics": {"code": "488510", "label": "Freight Transportation Arrangement"},
"secondary_naics": [{"code": "493110", "label": "General Warehousing and Storage"}],
"description": "Full-service freight forwarding and supply chain management company offering customs brokerage, warehousing, and transportation solutions across Europe.",
"business_model": ["B2B", "Service Provider"],
"core_offerings": ["freight forwarding", "customs brokerage", "warehousing"],
"target_markets": ["automotive", "manufacturing"],
"is_public": false
}

Important: not every company contains every field.
Missing data is common in real-world company datasets, so your solution should remain effective even when some information is unavailable.


 

2. Objective

Build a system that:

  • receives a user query
  • has access to the companies database

and determines which companies truly match the query.

Your solution should return a ranked or filtered list of companies that best satisfy the user’s intent.

The goal is not simply to find companies that are *similar* to the query — but companies that meaningfully satisfy the constraints implied by it.


 

3. Queries

Your system will be tested on 12 queries of varying complexity.

Some queries are highly structured and map directly to specific fields.

Example:

“Public software companies with more than 1,000 employees.”

Others require interpretation and reasoning.

Example:

“Fast-growing fintech companies competing with traditional banks in Europe.”

Some queries may involve:

  • supply chains
  • business relationships
  • inferred industry roles
  • vague or subjective criteria

Your system should attempt to handle both structured and judgment-heavy queries.


 

Example Queries

  • Logistic companies in Romania
  • Public software companies with more than 1,000 employees.
  • Food and beverage manufacturers in France
  • Companies that could supply packaging materials for a direct-to-consumer cosmetics brand
  • Construction companies in the United States with revenue over $50 million
  • Pharmaceutical companies in Switzerland
  • B2B SaaS companies providing HR solutions in Europe
  • Clean energy startups founded after 2018 with fewer than 200 employees
  • Fast-growing fintech companies competing with traditional banks in Europe.
  • E-commerce companies using Shopify or similar platforms
  • Renewable energy equipment manufacturers in Scandinavia
  • Companies that manufacture or supply critical components for electric vehicle battery production

Look carefully at the differences between these queries.

Some are almost entirely structured filters.

Others require interpreting the role a company plays within a broader ecosystem.

Such a problem requires a flexible system that can adapt to the complexity of the query.


 

4. Baselines

You may find the following baseline strategies tempting. Each works partially, but each has significant limitations.

BASELINE A — LLM Per Company

Send each company individually to an LLM and ask: “Does this company match the query?

Pros:

  • strong semantic understanding
  • decent accuracy

Cons:

  • expensive
  • slow
  • inconsistent
  • scales poorly

BASELINE B — Embedding Similarity

Embed the query and each company profile and rank by cosine similarity.

Pros:

  • cheap
  • fast

Cons:

  • poor intent understanding
  • similarity ≠ relevance

Example failure:

Query: “Companies supplying packaging for cosmetics brands”

Embedding search often ranks cosmetics companies instead of packaging suppliers.


 

Your Goal

Design a system that combines the strengths of these approaches while avoiding their weaknesses.

Your solution should aim to be:

  • More accurate than naive similarity search
  • Faster and cheaper than sending every company to an LLM
  • Scalable to large datasets


 

Expected Deliverables

  1. Implementation

    Provide a working solution that processes the set of queries

    Your solution should produce qualified companies for each query.

    You are free to design the architecture however you see fit.

  2. Code
    Submit the code implementing your approach.

    Your code should demonstrate:

    • clear structure
    • modular design
    • scalability considerations


    Organise your submission similar to the following:

    your-submission/
    ├── solution.py
    ├── WRITEUP.md
    ├── requirements.txt
    └── any supporting files

  3. Writeup

Alongside your implementation, submit a  WRITEUP.md explaining your solution.

We care as much about the way you think as the final results.

Your writeup should address the following:

3.1 Approach

Describe your system architecture.

  • What components does it include?
  • How do they interact?
  • Why did you choose this design?
3.2 Tradeoffs

What did you optimize for?

Examples:

  • speed
  • cost
  • accuracy
  • simplicity
  • robustness

What trade-offs did you intentionally make?

3.3 Error Analysis

Where does your system struggle?

Show concrete examples of companies it misclassifies and explain why.

3.4 Scaling

If the system needed to handle 100,000 companies per query instead of 500, what would you change?

3.5 Failure Modes

When might your system produce confident but incorrect results?

What would you monitor in production to detect these failures?


 

Critical Thinking

The strongest submissions show deep reflection about the problem and solution.

Ask yourself questions such as:

  • Where does my system work extremely well?
  • Where does it fail?
  • What assumptions did I make?
  • How robust is the system to missing data?
  • How well would this scale to millions of companies?
  • What improvements would I prioritise next?
  • What signals does the system rely on most heavily?
  • When might those signals be misleading?

Understanding the limits of your approach is as important as demonstrating its strengths.


 

Resources

If you’re ready to begin, start with the following: companies.jsonl

import pandas as pd
df = pd.read_json("data/companies.jsonl", lines=True)

We’re excited to see the solutions you come up with.

Focus on building a system that is thoughtful, scalable, and well-reasoned.

When you’re finished, please submit your solution as a GitHub repository.

Role outline

At Veridion, we are on a mission to shape the future of data-driven solutions. Way too many opportunities slip through the cracks because the right people don’t have the right info at the right time. We’ve seen businesses go down during global shake-ups and watched great ideas stall because teams are stuck wrestling with messy, outdated data. We’re here to make data fast, smart, and actually useful.

A little bit of everything
  • PreSales is where you juggle flaming swords while riding a unicycle… and somehow enjoy it. You’re right at the crossroads of three powerful worlds: tech, product, and client-facing.
  • Your mission is to absorb the best of each, connect the dots, and craft killer POCs that truly showcase what our data can do.
  • You’ve got to be a data analysis pro, not just spotting trends and patterns, but also catching the sneaky details that can make or break a POC.
  • You don’t need to be a full-time dev, but rolling up your sleeves for some coding is key.
  • All of this with one goal in mind: linking a client’s pain point to how our data can actually make their lives easier and their decisions smarter.
What is a POC at Veridion?

A Proof-of-Concept is a small-scale project used to demonstrate that a solution is feasible and effective in solving a specific problem before a client commits. At Veridion, a POC usually means delivering a dataset of 1k–10k companies and showing exactly how it can drive value in the prospect’s day-to-day operations. Sounds simple? It’s not.

Every sample is different, our data evolves weekly, each client has unique challenges, and we’re constantly crafting creative solutions or prototyping features that are still in development, or haven’t even been built yet. Your mandate is to drive things from point A to point B and fully own the process and its outcome.

Cross-Functional Collaboration

You’ll work closely with Veridion’s internal teams, including Technical, and Customer Success, to ensure seamless delivery of solutions that meet customer needs. Your collaborative spirit will help ensure alignment across departments, contributing to the overall success of the business.


 

Your challenge

Part 1: POC simulation

A large manufacturing company’s Procurement department is kicking off a digitalization journey. Their category managers have hit a wall – they can’t properly analyze spend because their supplier database is cluttered with messy, duplicate, and outdated entries. Meanwhile, leadership is pushing hard for a clear cost-saving strategy for next year. On top of that, there’s interest in exploring sustainability in the supply chain, but they just don’t have the resources to prioritize it right now.

They’re currently piloting solutions with two competitors: a well-known legacy provider and a newcomer. While they’re fairly satisfied with the newcomer’s performance, the legacy player’s strong market reputation and proven value still carry weight. Budget is already allocated, and they’re set to make a decision next quarter.

1. Entity Resolution

You’ve received a sample of companies from the client for the POC. Each entry has been processed through our entity resolution engine, returning up to 5 candidate matches per row. Your task is to select the best match for each input. If none are accurate, you can leave it unmatched or find the correct entity elsewhere. The goal is to resolve every row to a real-world company—if it exists online, you should be able to find it.

2. Data analysis and QC

Once you’ve picked the correct matches, review the data attributes we provide and look for any inconsistencies. You don’t need to fix them (though you’re free to if you want 😄), but you should think through how to curate the dataset so it’s clean and ready for the client.

3. Summarize your work

Walk us through everything you did and observed during the project. We care more about understanding your thinking process than the specific tools you used to get there.

4. Publish your work

Choose the format that you think best fits this type of challenge / use case and publish it. While solutions to all other challenges are expected to be published on github,  you can use the github field in the form below to link your work, even if it leads to a different domain other than github.

5. Submit the challenge

Submit your challenge in the form below.


 

Next steps if you pass this round

Part 2: Roleplay Prep

You’ll need to put together a presentation for the client that highlights our value proposition and clearly walks them through the POC results.

Part 3: In-Person interview

You’ll have 30 minutes during the in-person interview to present your deck – we’ll play the role of the client. After that, there’ll be a 1-hour live task to give you a better feel for the kind of work a PreSales Data Specialist handles.


 

Resources & broader background

 

At Veridion, we appreciate talent, skill, and a commitment to excellence. We offer and expect a high level of honesty and integrity throughout our professional relationships. If you’re passionate about data, innovative technology, and making a real impact, we look forward to welcoming you to our team.

 

Thank you and good luck! See you in the next round.