Engineering Challenges - Veridion

Engineering Challenges

👋 Welcome!

 

You’ve landed here because we’re considering your profile for our team.
This is designed to give you a better idea of what to expect from working at Veridion and dealing with the kind of problems we're investing our time with. The challenges we prepared will allow you to showcase how you think, solve problems, and approach real-world challenges, and also give you a glimpse into the types of tasks you will be facing in your role here.

The way we work

The thought process behind our problem-solving approach

Solve hard, useful, and unresolved problems

The tech we use enables novel solutions, and as such, we tackle the hard stuff that others won’t, and focus on problems that haven’t been solved yet.

Creativity over convention

We don’t follow templates. Most problems we face have no prior solution, so we have to come up with our own ways of getting things done.

Prioritize solving the problem

We find the most effective way to solve the problem, not just using a specific tool or framework. We employ whatever technique necessary without imposing artificial limitations.

Deliver real-world value

We build solutions that genuinely work and provide actual value, effectively balancing development time, accuracy, and speed.

Exponential growth

We scale by continuously learning, iterating, and pushing boundaries for both our product and our team.

What makes a great solution

Here’s what we expect from a remarkable project:

The way you deliver this project reflects your work ethic and how you tackle problems from start to finish. So, please ensure your project is as ‘production-ready’ as possible. If you’re not ready to give it your best, it’s probably not worth doing at all. It’s pointless to submit 40 lines of code and call it a solution to one of the challenges below.

The goal is to show that you can make an impact and approach problems with the right mindset, just as you would as part of our team. While we don’t expect you to have all the answers right now, we do expect you to demonstrate a strong work ethic, the potential for growth, and the ability to tackle complex problems effectively. Because if you don’t prove that, we can’t afford to take you seriously as a potential team member. If hired, your effort during the recruiting process will be rewarded with a bonus—this is not about free work. 

Correctness

Does your solution meet all the requirements and constraints given in the challenge?

Robustness

How well does your solution handle unexpected input or edge cases?

Code quality

Is the code well-organized, readable, and maintainable?

Extra mile

Does your solution only address the bare minimum or does it go beyond the surface?

Presentation

How well does the presentation reflect your reasoning, why you made certain decisions and the thought process behind your solution?

Choose your track

Check out the challenges below and choose the one that you think you’ll do your best at. They’re very similar with the type of tasks you’ll work on once you join our team. 

Can’t wait to see what you come up with. Enjoy the process—we’re confident you’ll kick ass!

Task

Match and group websites by the similarity of their logos. 

Context

Logos are instrumental for a company’s identity – they’re the symbol that customers use to recognize your brand. Ideally, you’ll want people to instantly connect the sight of your logo with the memory of what your company does – and, more importantly, how it makes them feel.

Guidelines

  • Take the time to deeply understand the problem before writing code. Even the most sophisticated solution is ineffective if it solves the wrong problem. Misalignment in problem definition leads to incorrect conclusions and wasted effort.
  • We know this is a clustering problem, you know this is a clustering problem, question is: can you do it without ML algorithms (like DBSCAN or k-means clustering)?
  • Check whether the program correctly extracts the logo and matches them properly (as a human, you instantly recognize them, but this is way harder for a machine).
  • Explore this from as many different angles as you can. It will generate valuable questions.
  • From a tech stack perspective, you can use any programming language, toolset or libraries you’re comfortable with or find necessary, especially if you know it would be a better option or a more interesting one (we generally prefer Node, Python, Scala).
  • At Veridion, we run similar algorithms on billions of records. While your solution doesn’t need to scale to that level, it would be impressive if it does. For now, however, what matters most is your approach to solving the problem—if your solution is exceptional for the given dataset, we trust that you can scale it effectively using the right tools.

Resources

If you’re ready to jump into the problem, please start with the following list of company websites:

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here—feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Output
    Out of the given dataset your algorithm should be able to extract logos for more than 97% of them.
    Your program should output multiple groups, each containing one or more websites (it could be possible that some logos would be unique to only one website). Make sure to upload your results along with the code.
  3. Code and Logic
    Include the code that enabled you to achieve this task for the provided list, and ideally, for any list of any size.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Task

Build a robust company classifier for a new insurance taxonomy.

Objectives

  1. Accept a list of companies with associated data:
    – Company Description
    – Business Tags
    – Sector, Category, Niche Classification
  2. Receive a static taxonomy (a list of labels) relevant to the insurance industry.
  3. Build a solution that accurately classifies these companies, and any similar ones, into one or more labels from the static taxonomy.
  4. Present your results and demonstrate effectiveness.

Guidelines

Since this is an unsolved problem without a predefined ground truth, you’ll need to validate your classifier’s performance through your own methods.

  • Analyze strengths and weaknesses:
    • Explain where your solution excels and where it may need improvement.
    • Discuss scalability and how your solution performs with large datasets.
    • Reflect on any assumptions made and unknown factors that could impact your solution.
  • Ensure your solution truly addresses the problem
    • Focus on solving the actual problem, not just implementing complex algorithms. Using embeddings, zero-shot models, TF-IDF, clustering, or other techniques is meaningless if companies are misclassified due to a flawed approach. A well-designed solution is more important than an impressive algorithm.
    • Your evaluation should demonstrate that your solution effectively addresses the problem. Simply plotting similarity scores or reporting F1 and accuracy metrics without meaningful validation only measures alignment with your own heuristic, not real-world effectiveness.
    • Take the time to deeply understand the problem before writing code. Even the most sophisticated solution is ineffective if it solves the wrong problem. Misalignment in problem definition leads to incorrect conclusions and wasted effort.
  • Provide insights into your problem-solving process:
    • Why you did what you did, what other paths you considered, and especially why you chose not to pursue them.
  • At Veridion, we run similar algorithms on billions of records. While your solution doesn’t need to scale to that level, it would be impressive if it does. For now, however, what matters most is your approach to solving the problem—if your solution is exceptional for the given dataset, we trust that you can scale it effectively using the right tools.

Resources

If you’re ready to jump into the problem, please start with the following files:

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here—feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Annotated Input List

    Return the input list with a new column titled “insurance_label” where you have correctly classified each company into one or more labels from the insurance taxonomy.

  3. Code and Logic

    Include the code that enabled you to achieve this classification for the provided list, and ideally, for any list of any size.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Task

Identify unique companies and group duplicate records accordingly.

Context

The dataset contains company records imported from multiple systems, leading to duplicate entries with slight variations.

Guidelines

  • Take the time to deeply understand the problem before writing code. Even the most sophisticated solution is ineffective if it solves the wrong problem. Misalignment in problem definition leads to incorrect conclusions and wasted effort.
  • The dataset includes extensive company details, but not all fields are necessary for deduplication.
  • The key challenge is to identify and leverage the most relevant attributes to accurately detect and group duplicate records.
  • Take the time to research and understand what defines a company and which attributes uniquely identify it. This understanding is crucial for accurately detecting and grouping duplicate records.
  • At times, incomplete data may require you to make decisions where there is no clear right or wrong choice. What matters is backing each decision with the reasoning behind it.
  • It’s essential to document your decisions and the reasoning behind them.
  • From a tech stack perspective, you can use any programming language, toolset or libraries you’re comfortable with or find necessary, especially if you know it would be a better option or a more interesting one (we generally prefer Scala, Java, Python).
  • At Veridion, we run similar algorithms on billions of records. While your solution doesn’t need to scale to that level, it would be impressive if it does. For now, however, what matters most is your approach to solving the problem—if your solution is exceptional for the given dataset, we trust that you can scale it effectively using the right tools.

Resources

If you’re ready to jump into the problem, please start with the following file:

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here—feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Output

    Return the updated dataset where you have correctly identified unique companies and grouped duplicate records accordingly.

  3. Code and Logic

    Include the code that enabled you to achieve the required entity resolution for the provided list.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Task

The goal is to consolidate duplicates into a single, enriched entry per product, maximizing available information while ensuring uniqueness.

Context

The dataset contains product details extracted from various web pages using LLMs, resulting in duplicate entries where the same product appears across different sources. Each row represents partial attributes of a product.

Guidelines

  • Take the time to deeply understand the problem before writing code. Even the most sophisticated solution is ineffective if it solves the wrong problem. Misalignment in problem definition leads to incorrect conclusions and wasted effort.
  • Thoroughly analyze the dataset to understand each attribute clearly.
  • There isn’t always a single solution to this problem. Some decisions may be neither strictly right nor wrong, but they should be supported by as many relevant factors as possible.
  • It’s essential to document your decisions and the reasoning behind them.
  • From a tech stack perspective, you can use any programming language, toolset or libraries you’re comfortable with or find necessary, especially if you know it would be a better option or a more interesting one (we generally prefer Scala, Java, Python).
  • At Veridion, we run similar algorithms on billions of records. While your solution doesn’t need to scale to that level, it would be impressive if it does. For now, however, what matters most is your approach to solving the problem—if your solution is exceptional for the given dataset, we trust that you can scale it effectively using the right tools.

Resources

If you’re ready to jump into the problem, please start with the following file:

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here—feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Output

    Return the updated dataset where you have correctly consolidated duplicates into a single, enriched entry per product, maximizing available information while ensuring uniqueness.

  3. Code and Logic

    Include the code that enabled you to achieve the required product deduplication for the provided list.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Task

Design an algorithm that will group together HTML documents which are similar from the perspective of a user who opens them in a web browser.

Guidelines

Resources

If you’re ready to jump into the problem, please start with the following list of company websites:

Expected Deliverables

  1. Solution explanation / presentation

    Provide an explanation or presentation of your solution and results. You have total creative freedom here—feel free to impress with your thinking process, the paths you took or decided not to take, the reasoning behind your decisions and what led to your approach.

  2. Output
    Your program should take one subdirectory at a time and output the grouped documents, something along the lines of: [A.html, B.html], [C.html], [D.html, E.html, F.html] … .

  3. Code and Logic

    Include the code that enabled you to achieve this task for the provided list, and ideally, for any list of any size.

Submit your project

When you’re finished with the challenge, please submit the link to your Github project below.

Can’t wait to see what you come up with. Enjoy the process—we’re confident you’ll kick ass!

Schedule a data consultation