Figures logo
Solutions
BenchmarkFigures x MercerSalary BandsCompensation ReviewPay Equity
PricingCustomers
Resources
BlogCompClubGuidesWebinars
Company
About usPressSecurityPartners & Integrations
Get a demo
LoginGet a demo
Log in Figures
Favicon 256x256
If your company uses Google Workspace
Login with Google
If your company uses Microsoft 365
Login with Microsoft
If your company uses SAML SSO
Login with SAML SSO
If you prefer to receive a login link by email
Sign-in with a login link
Close
  • Home
  • >
  • Blog
  • >
  • We Asked ChatGPT for 10,000 Salary Benchmarks. Here's How Wrong It Was.

We Asked ChatGPT for 10,000 Salary Benchmarks. Here's How Wrong It Was.

Compensation
•
27
/
01
/
26
•
2
min read
We Asked ChatGPT for 10,000 Salary Benchmarks. Here's How Wrong It Was.
Table of contents
Heading 2
Share
Lien copié !

It’s Q4, and compensation review season is in full swing. Your inbox is overflowing with manager requests, your spreadsheets have sprouted seventeen new tabs, and you're staring down a list of roles that need salary benchmarks by Friday.

Then someone on your team has a bright idea: "Why don't we just ask ChatGPT?"

And honestly? We get it. ChatGPT is fast, free, and available at 2 am when you're still wrestling with that market data request from Finance. On top of everything, it sounds authoritative AND gives you numbers. When AI seems to be doing everything from writing code to diagnosing medical conditions, why wouldn't it know what a Senior Product Manager in Munich should earn?

However, compensation decisions affect real people's mortgages, career moves, and sense of being valued. They also affect your company's budget, retention rates, and (increasingly) legal compliance. When actual money is on the line, "sounds about right" isn't really good enough.

🔍 So we tested it. The result? ChatGPT's salary predictions are off by 23.3% on average, with errors reaching 61% for executive roles.

Together with Ellipsis, we ran 10,000 salary prompts across 200 roles in Paris, London, and Munich, then compared the AI's predictions against Figures' database of actual European salary data from client companies. What follows is exactly where ChatGPT gets it wrong, why it happens, and what it means for your next compensation review.

The ChatGPT salary benchmark experiment: what we did

We didn't want to cherry-pick a few bad examples and call it a day. If we were going to test whether ChatGPT could handle salary benchmarking, we needed to do it properly.

The setup:

  • 200 roles across job families, including Software Engineering, Sales, Marketing, Finance, People Operations, and more.
  • 3 locations: Paris, London, and Munich.
  • 50 prompt variations per role, giving us 10,000 total predictions.
  • Comparison against Figures' real market data from actual European companies.

We deliberately varied how we asked. Some prompts were direct ("What's the median salary for a Senior Data Engineer in Paris?"). Others took a hiring manager's perspective, a job seeker's angle, or a formal HR consultant tone. We asked conversationally. We asked with detailed context. We asked with almost no context at all.

Why bother with 50 variations? Because we knew there would be some smart alec who would say: "You just asked wrong."

🤔 Interestingly, prompt style made almost no difference. Whether we asked as a recruiter, a job seeker, or a curious researcher, the error rates varied by just 2.3% across all ten prompt categories. That means the AI's knowledge is the issue – not how you phrase the question.

The results: how accurate is ChatGPT, really?

Let's cut to the numbers.

Metric Average Error Median Error
Base Salary 16.8% 12.5%
OTE (Total Compensation) 23.3% 16.8%
📊 A note on our comparison data: These predictions were measured against Figures' European salary database, built from verified HRIS data provided directly by our client companies. While this represents one of the most reliable real-world datasets available for European compensation, it reflects participating companies rather than the entire market. We selected only our most robust data points for this comparison to ensure the fairest possible test.

At first glance, a 17% average error might not sound catastrophic. But let's put that in context.

On a €60,000 salary, a 17% error means you're off by €10,200. This number is the difference between an offer a candidate accepts and one they reject, and between an employee who feels fairly paid and one updating their LinkedIn profile.

Scale that across a team of 50 people, and you're looking at potentially €500,000 in misallocated compensation budget. But even at the individual level, getting one salary wrong by €10,000 can cost you a hire or trigger an exit.

The directional bias is even more revealing:

  • 60% of predictions were too high.
  • Only 37% were too low.
  • On average, GPT-5.1 overestimates European salaries by around 9%.

🚨 That means the AI has a systematic tendency to inflate what Europeans actually earn. We'll dig into why shortly.

"A 17% error rate might sound acceptable until you realise what it means in practice. You're either overpaying and burning through budget, or underpaying and losing candidates to competitors who actually did the homework."

– Virgile Raingeard, CEO at Figures

The median error (12.5% for base salary) is a little lower than the average – but medians often hide the extremes. And in compensation, it's often the extremes that matter most.

1 - ChatGPT salary prediction errors

Where it really falls apart: the seniority problem

We haven’t even gotten to the worst part yet. ChatGPT actually gets worse as the stakes get higher.

Level Base Salary Error OTE (On-Target-Earnings) Error
Junior 12.3% 12.0%
Intermediate 11.0% 14.7%
Senior 19.4% 27.0%
Team Lead 16.8% 24.3%
Manager 20.6% 24.5%
Head of 24.1% 35.0%
Director 9.1% 22.6%
VP 15.8% 31.7%
C-Level 40.8% 61.1%

For junior and intermediate roles, the AI performs reasonably well. These are the positions with standardised salaries, transparent job postings, and plenty of publicly available data for the model to learn from.

But executive compensation? That's where things go off the rails.

The C-Level disaster in numbers:

  • 40.8% average error on base salary.
  • 61.1% average error on total compensation.
  • Chief Marketing Officer in Paris: actual salary €110,000, ChatGPT predicted €178,800 – a 62.5% miss.
  • Chief Operating Officer in Paris: actual €122,400, predicted €188,400 – off by 54%.
‼️ The higher the salary, the worse the AI performs.

That’s because executive compensation is rarely ever published. The structures are complex (equity, deferred bonuses, performance multipliers). And the training data skews heavily toward US Big Tech packages that bear little resemblance to what’s going on in Europe.

So, ChatGPT is most reliable for roles where you probably need the least help, and least reliable where accurate data matters most. Frustrating to say the least. 

2 - ChatGPT salary prediction error rate by seniority level

The OTE problem: ChatGPT doesn't understand European variable pay

You might have noticed a pattern in those tables: OTE (On-Target Earnings) predictions are consistently worse than base salary. Across every seniority level, every location, every job family – the AI struggles more with total compensation than with fixed pay.

The gap is about 6-7 percentage points off of actual figures, every single time.

Job Family Base Salary Error OTE Error The Gap
Sales / Customer Success 12.4% 25.6% +13.2%
Finance 19.2% 25.9% +6.7%
Product & Design 21.2% 28.0% +6.8%
Marketing / Growth 17.9% 23.4% +5.5%

Sales and Customer Success show the most dramatic split. Base salary predictions are actually decent (12.4% error), but OTE balloons to 25.6%. 

Why does this happen?

Variable compensation is inherently harder to predict. Bonus structures vary enormously by company size, industry, funding stage, and individual negotiation. There's far less public data available on actual bonus payouts compared to base salaries, which appear in job postings and salary surveys.

When we examined ChatGPT's reasoning on high-error OTE predictions, we found the model making confident assumptions about bonus percentages and commission structures – but these assumptions weren't grounded in any specific market data. The AI fills gaps in its knowledge with generalisations, and those generalisations compound when you're calculating total compensation. 

For roles where variable pay makes up a significant portion of the package, getting OTE wrong means getting the whole offer wrong.

📊 The takeaway: If you're benchmarking roles with meaningful bonus or commission components, treat ChatGPT's OTE figures with extra caution. The base salary estimate might get you in the right postcode, but the total compensation prediction could be in a different city entirely.
3 - ChatGPT OTE prediction error rate by job family

The training data problem: why ChatGPT gets Europe wrong

So what's actually going on here? Why does ChatGPT consistently overshoot European salaries?

Well, that’s because the internet is overwhelmingly American. Shouldn’t be surprised, really. Most content is optimised for the American market, and unfortunately, that will have a considerable impact on the sources that ChatGPT learns from. 

And when we analysed the reasoning behind 61 high-error predictions (those missing by more than 20%), our findings cemented this fact:

1. Big Tech anchoring

Even when ChatGPT acknowledges that FAANG salaries are outliers, it still uses them as reference points. In one prediction for a Staff Product Manager in Paris (42.7% error), the model wrote:

"For very large US tech companies paying near-US bands, total comp can exceed these ranges, but those are outliers in the Paris market."

It knew these were outliers. It still landed ~43% too high. Even when explicitly calling Big Tech salaries "outliers," the predictions still land way above actual market data. It's like knowing you shouldn't eat the entire cake while your hand is already reaching for the third slice.

2. Scale-up bias

The default assumption across predictions was "growth-stage or mid-sized company, roughly 100-1,000 employees" in well-funded tech. Great if that's your company. Not so helpful if it isn't.

3. Getting European geography wrong

ChatGPT stated that Paris salaries run "10-20% below London." Nonsense. More like 25-40% below for comparable roles. That miscalibration alone explains a significant chunk of the Paris errors.

4. Recruiter salary guide contamination

Multiple predictions cited Michael Page, Robert Walters, and Hays salary guides as sources. That’s good news… right? 

Not exactly. Recruiter guides are known to inflate figures for competitive positioning. One Senior Financial Controller prediction landed at €94,000 based on these guides, even though actual market data put that figure at €62,000. A 51% miss.

Location Base Salary Error OTE Error
London 13.8% 20.2%
Paris 17.6% 22.0%
Munich 18.8% 28.0%

London predictions were most accurate – unsurprisingly, given that more English-language salary data exists for the UK market. But even there, a 14% base error and 20% OTE error are hardly reliable.

For continental Europe? The model is essentially guessing with an American accent.

4 - ChatGPT salary prediction error rate by location

The few wins: where ChatGPT actually gets it right

In the spirit of fairness, let's acknowledge where ChatGPT was most accurate.

Role Location Error Actual Predicted
Intermediate Accountant Paris 1.0% €45,00 €45,14
Intermediate Back-End Developer Munich 1.5% €70,00 €69,34
Intermediate Data Engineer Munich 1.8% €75,00 €74,44
Intermediate Data Scientist Munich 2.3% €70,20 €71,60
Senior Accountant Munich 2.5% €75,50 €76,92

The keen-eyed have likely spotted the pattern already.

These are all mid-level roles in well-documented fields. Accounting has standardised qualifications and relatively transparent pay scales. Technical roles like developers and data engineers are discussed endlessly online – on forums, in job postings, across salary-sharing platforms.

With that in mind, we can actually see what makes these estimates tick:

  • Standardised roles with clear job definitions.
  • Intermediate seniority, where pay bands are narrower.
  • Technical fields with abundant public salary data.
  • Less variable compensation to complicate the picture.

Sounds good, but remember: These are all estimates that you could’ve easily found yourself. The junior developer market is already reasonably transparent. Stack Overflow surveys, Glassdoor, LinkedIn… There's no shortage of directional data.

It's the Head of Product role, the Chief Commercial Officer, and the VP of People in a specific European market where you genuinely need accurate benchmarks. And that's precisely where ChatGPT falls short.

So, if you need information about higher-end roles, ChatGPT is about as useful as a chocolate teapot. Or a glass hammer, you can take your pick. 

5 - When ChatGPT gets salary predictions right

What this means for compensation review season

Timing matters here.

October to March is peak compensation review season across Europe. It's when HR teams finalise salary bands, calculate bonus pools, make promotion decisions, and set budgets for the year ahead. It's also when the temptation to take shortcuts is highest – because everyone is stretched thin.

The risks of AI-generated benchmarks during this period fall into three buckets:

1. Overpaying based on inflated estimates

If ChatGPT's 9% average overestimation creeps into your salary bands, you're burning budget that could go elsewhere. Multiply that across departments, and you've created comp structures that aren't sustainable. That all leads to painful corrections down the line.

2. Underpaying and losing talent

The flip side hurts differently. When the AI underestimates (and it does for 37% of predictions), you risk making offers that candidates reject, or setting internal salaries that push your best people toward the door.

3. Inconsistency you can't explain

Different team members querying ChatGPT get different answers. There's no audit trail, no methodology to point to, no way to justify decisions when an employee asks, "How did you arrive at this number?"

That last point becomes especially uncomfortable with the EU Pay Transparency Directive arriving in June 2026. Companies will need to explain and defend their compensation decisions. Regulatory bodies won't accept "the AI told us" as a methodology. 

🤔 Under the EU Pay Transparency Directive, companies with pay gaps exceeding 5% between genders for equivalent roles will need to conduct joint pay assessments. Inaccurate benchmarking today creates compliance headaches tomorrow.

What actually works: real benchmarking data

So if ChatGPT isn't the answer, what is?

The alternative isn't complicated; it's just less flashy than asking an AI chatbot. Real compensation benchmarking relies on actual salary data from actual companies, updated regularly, and specific to your market.

When evaluating any benchmarking source, ask these questions:

  • How fresh is the data? Annual surveys are outdated by the time they're published. Monthly updates reflect what's actually happening in the market.
  • Where does it come from? Real HRIS data from employers beats self-reported figures or recruiter estimates every time.
  • Is it geographically specific? European roles need European data. A global average that's 60% weighted toward the US market won't help you hire in Lyon.
  • Can you defend it? When a candidate pushes back on an offer, or a regulator asks how you set your bands, you need a methodology that holds up to scrutiny.

Figures was built specifically for this problem. The platform draws on 3.5 million data points powered by Mercer, updated monthly through direct HRIS integrations with over 30 systems. 

🎊 That means you're comparing against what European companies actually pay their people, not what an AI thinks they might pay based on job postings and American tech blogs.

What this looks like in practice: Treatwell's story

6 - Figures Treatwell case study banner

Whether you're using ChatGPT or manual methods, the core problem is the same: unreliable data. Before finding Figures, beauty tech company Treatwell faced exactly the kind of data challenge this study highlights, but multiplied across 13 European markets.

Their HR team manually collected salary data for over 750 roles, trawling job adverts, Glassdoor reviews, and candidate expectations. Then they entered everything into spreadsheets. It’s somewhere we’ve likely all been before. 

"We would be trying to get an indication for our salaries, but it was super difficult," explains Talya Avram, Director of Talent Acquisition. "In some countries, the salary appears on ads, but it isn't really a common thing."

This led to hours of manual labour producing data that were still inaccurate or unreliable. And with plans to recruit 600 more people across those 13 markets, they needed a better approach.

After switching to Figures, Treatwell's leaders are now empowered to make informed budgeting decisions based on accurate, up-to-date salary data across every market they operate in. No more guessing, thankfully.

And the thing is, using proper benchmarking tools often takes less time than querying ChatGPT repeatedly, second-guessing the answers, and manually adjusting for the errors you suspect are there.

What you really need is one dashboard with one source of truth. And Figures delivers that. 

The verdict: ChatGPT is great – just not for salary benchmarks

Let’s lovebomb ChatGPT a bit. Because it genuinely isn’t a bad tool. It can draft emails, summarise documents, brainstorm ideas, and explain complex concepts in seconds. 

But salary benchmarking? That's not its strength.

The numbers speak for themselves:

  • 16.8% average error on base salary.
  • 23.3% average error on total compensation.
  • Up to 62% error on executive roles.
  • A systematic bias that overestimates European salaries by nearly 10%.

The AI performs best on junior and mid-level technical roles – positions where public salary data is already abundant, and you probably don't need much help. For the roles that particularly require accurate benchmarking – senior leadership, sales compensation, niche functions – the error rates climb to levels that could seriously damage your budget, your offers, and your retention.

And here's the thing: the model doesn't know what it doesn't know. It delivers every answer with the same confident tone, whether it's 1% off or 60% off. Confidence doesn’t equal being correct. 

When you're making decisions that affect people's livelihoods and your company's financial health, you need more than confident guesses. 

You need data you can trust, explain, and defend. Ready to see what that looks like?

Book a free demo with Figures and discover how real European salary data can transform your compensation decisions – no guesswork required.

Mégane Gateau
Mégane Gateau
Mégane Gateau is VP Marketing at Figures, where she blends strategic marketing with a deep curiosity for HR topics like compensation, equity, and transparency. She’s passionate about making complex ideas accessible and driving conversations that matter in the future of work.
Share blog post
Lien copié !

Summarize this article with AI

No time to read it all? Get a clear, structured, and actionable summary in one click.

ChatGPT
Gemini
Claude
Perplexity

Related posts

View all articles
Salary Adjustment Process From Market Analysis to Payroll Implementation
Compensation
Salary Adjustment Process From Market Analysis to Payroll Implementation

Master the 6-step salary adjustment process for UK companies. From market analysis to payroll, with legal compliance, budget controls and real-time data.

How to Set Fair Pay for Remote and Hybrid Roles
Compensation
How to Set Fair Pay for Remote and Hybrid Roles

Learn how to set fair, bias-free pay for remote and hybrid teams with clear rules, location-aware data and structured compensation practices.

What is Pay Progression and How Do You Create a Fair Progression Framework?
Compensation
What is Pay Progression and How Do You Create a Fair Progression Framework?

Learn how to build a fair, structured pay progression framework with clear steps, criteria and salary ranges—backed by reliable market data.

View all articles
Envelope
Stay updated on the latest compensation insights
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
EnvelopeNewsletter
Envelope
Join the Compversation

Subscribe to the most read bi-monthly newsletter by the French Comp & Ben

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
EnvelopeNewsletter
Figures logo
English
English
Français
Solutions
Compensation ReviewSalary BandsBenchmarkPay Gap ReportsPay TransparencyPricingSecurity
Ressources
BlogWebinarsGuides
Company
CustomersIntegrations and PartnersAbout UsContact UsPressCareers
Legal
Terms of UseWebsite Privacy PolicyCookie PolicyApplication Privacy PolicyTrust CentreImprint
ISO27001
Paytransparency
SOC
GDPR
© 2025 Figures. All rights reserved.