Anthropic claims its new AI chatbot models beat OpenAI’s GPT-4

11:50 AM PST • March 4, 2024

Image Credits: Anthropic

AI startup Anthropic, backed by Google and hundreds of millions in venture capital (and perhaps soon hundreds of millions more), today announced the latest version of its GenAI tech, Claude. And the company claims that the AI chatbot beats OpenAI’s GPT-4 in terms of performance.

Claude 3, as Anthropic’s new GenAI is called, is a family of models — Claude 3 Haiku, Claude 3 Sonnet and Claude 3 Opus, Opus being the most powerful. All show “increased capabilities” in analysis and forecasting, Anthropic claims, as well as enhanced performance on specific benchmarks versus models like ChatGPT and GPT-4 and Google’s Gemini 1.0 Ultra (but not Gemini 1.5 Pro).

Notably, Claude 3 is Anthropic’s first multimodal GenAI, meaning that it can analyze text as well as images — similar to some flavors of GPT-4 and Gemini. Claude 3 can process photos, charts, graphs and technical diagrams, drawing from PDFs, slideshows and other document types.

In a step one better than some GenAI rivals, Claude 3 can analyze multiple images in a single request (up to a maximum of 20). This allows it to compare and contrast images, notes Anthropic.

But there are limits to Claude 3’s image processing.

Anthropic has disabled the models from identifying people — no doubt wary of the ethical and legal implications. And the company admits that Claude 3 is prone to making mistakes with “low-quality” images (under 200 pixels) and struggles with tasks involving spatial reasoning (e.g. reading an analog clock face) and object counting (Claude 3 can’t give exact counts of objects in images).

Anthropic Claude 3 — **Image Credits:** Anthropic

Claude 3 also won’t generate artwork. The models are strictly image-analyzing — at least for now.

Whether fielding text or images, Anthropic says that customers can generally expect Claude 3 to better follow multi-step instructions, produce structured output in formats like JSON and converse in languages other than English compared to its predecessors. Claude 3 should also refuse to answer questions less often thanks to a “more nuanced understanding of requests,” Anthropic says. And soon, the models will cite the source of their answers to questions so users can verify them.

“Claude 3 tends to generate more expressive and engaging responses,” Anthropic writes in a support article. “[It’s] easier to prompt and steer compared to our legacy models. Users should find that they can achieve the desired results with shorter and more concise prompts.”

Some of those improvements stem from Claude 3’s expanded context.

A model’s context, or context window, refers to input data (e.g. text) that the model considers before generating output. Models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic — often in problematic ways. As an added upside, large-context models can better grasp the narrative flow of data they take in and generate more contextually rich responses (hypothetically, at least).

Anthropic says that Claude 3 will initially support a 200,000-token context window, equivalent to about 150,000 words, with select customers getting up a 1-milion-token context window (~700,000 words). That’s on par with Google’s newest GenAI model, the above-mentioned Gemini 1.5 Pro, which also offers up to a million-token context window.

Now, just because Claude 3 is an upgrade over what came before it doesn’t mean it’s perfect.

In a technical whitepaper, Anthropic admits that Claude 3 isn’t immune from the issues plaguing other GenAI models, namely bias and hallucinations (i.e. making stuff up). Unlike some GenAI models, Claude 3 can’t search the web; the models can only answer questions using data from before August 2023. And while Claude is multilingual, it’s not as fluent in certain “low-resource” languages versus English.

But Anthropic is promising frequent updates to Claude 3 in the months to come.

“We don’t believe that model intelligence is anywhere near its limits, and we plan to release [enhancements] to the Claude 3 model family over the next few months,” the company writes in a blog post.

Opus and Sonnet are available now on the web and via Anthropic’s dev console and API, Amazon’s Bedrock platform and Google’s Vertex AI. Haiku will follow later this year.

Here’s the pricing breakdown:

Opus: $15 per million input tokens, $75 per million output tokens
Sonnet: $3 per million input tokens, $15 per million output tokens
Haiku: $0.25 per million input tokens, $1.25 per million output tokens

So that’s Claude 3. But what’s the 30,000-foot view of all this?

Well, as we’ve reported previously, Anthropic’s ambition is to create a next-gen algorithm for “AI self-teaching.” Such an algorithm could be used to build virtual assistants that can answer emails, perform research and generate art, books and more — some of which we’ve already gotten a taste of with the likes of GPT-4 and other large language models.

Anthropic hints at this in the aforementioned blog post, saying that it plans to add features to Claude 3 that enhance its out-of-the-gate capabilities by allowing Claude to interact with other systems, code “interactively” and deliver “advanced agentic capabilities.”

That last bit calls to mind OpenAI’s reported ambitions to build a software agent to automate complex tasks, like transferring data from a document to a spreadsheet or automatically filling out expense reports and entering them in accounting software. OpenAI already offers an API that allows developers to build “agent-like experiences” into their apps, and Anthropic, it seems, is intent on delivering functionality that’s comparable.

Could we see an image generator from Anthropic next? It’d surprise me, frankly. Image generators are the subject of much controversy these days, mainly for copyright- and bias-related reasons. Google was recently forced to disable its image generator after it injected diversity into pictures with a farcical disregard for historical context. And a number of image generator vendors are in legal battles with artists who accuse them of profiting off of their work by training GenAI on that work without providing compensation or even credit.

I’m curious to see the evolution of Anthropic’s technique for training GenAI, “constitutional AI,” which the company claims makes the behavior of its GenAI easier to understand, more predictable and simpler to adjust as needed. Constitutional AI aims to provide a way to align AI with human intentions, having models respond to questions and perform tasks using a simple set of guiding principles. For example, for Claude 3, Anthropic said that it added a principle — informed by crowdsourced feedback — that instructs the models to be understanding of and accessible to people with disabilities.

Whatever Anthropic’s endgame, it’s in it for the long haul. According to a pitch deck leaked in May of last year, the company aims to raise as much as $5 billion over the next 12 months or so — which might just be the baseline it needs to remain competitive with OpenAI. (Training models isn’t cheap, after all.) It’s well on its way, with $2 billion and $4 billion in committed capital and pledges from Google and Amazon, respectively, and well over a billion combined from other backers.

More TechCrunch

Meta rolls out Meta Verified for WhatsApp Business users in Brazil, India, Indonesia and Colombia

Ivan Mehta

31 mins ago

Meta launched its Meta Verified program today along with other features, such as the ability to call large businesses and custom messages.

Meta rolls out Meta Verified for WhatsApp Business users in Brazil, India, Indonesia and Colombia

Apps

Meta adds AI-powered features to WhatsApp Business app

Ivan Mehta

32 mins ago

Last year, during the Q3 2023 earnings call, Mark Zuckerberg talked about leveraging AI to have business accounts respond to customers for purchase and support queries. Today, Meta announced AI-powered…

Meta adds AI-powered features to WhatsApp Business app

Apps

TikTok is testing Snapchat-like streaks

Ivan Mehta

42 mins ago

TikTok is testing streaks that are similar to Snapchat’s in order to boost engagement, including how long people stay on the app.

Transportation

Inside Fisker’s collapse and robotaxis come to more US cities

Rebecca Bellan

1 hour ago

Welcome back to TechCrunch Mobility — your central hub for news and insights on the future of transportation. Sign up here for free — just click TechCrunch Mobility! Your usual…

Inside Fisker’s collapse and robotaxis come to more US cities

Transportation

Revel to lay off 1,000 staff ride-hail drivers, saying they’d rather be contractors anyway

Rebecca Bellan

2 hours ago

New York-based Revel has made a lot of pivots since initially launching in 2018 as a dockless e-moped sharing service. The BlackRock-backed startup briefly stepped into the e-bike subscription business.…

Revel to lay off 1,000 staff ride-hail drivers, saying they’d rather be contractors anyway

Apps

Google Play cracks down on AI apps after circulation of apps for making deepfake nudes

Sarah Perez

2 hours ago

Google says apps offering AI features will have to prevent the generation of restricted content.

Google Play cracks down on AI apps after circulation of apps for making deepfake nudes

Commerce

UK retailers file a £1.1B collective action against Amazon over claims of data misuse

Ingrid Lunden

2 hours ago

The British retailers association also takes aim at Amazon’s “Buy Box,” claiming that Amazon manipulated which retailers were selected for the coveted placement.

UK retailers file a £1.1B collective action against Amazon over claims of data misuse

Featured Article

Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Rivian has changed 600 parts on its R1S SUV and R1T pickup truck in a bid to drive down manufacturing costs, while improving performance of its flagship vehicles. The end goal, which will play out over the coming year, is an existential one. Rivian lost about $38,784 on every vehicle…

Kirsten Korosec

2 hours ago

Rivian overhauled the R1S and R1T to entice new buyers ahead of cheaper R2 launch

Media & Entertainment

Twitch DJs will now have to pay music labels to play songs in livestreams

Lauren Forristal

2 hours ago

Twitch has come up with a solution for the ongoing copyright issues that DJs encounter on the platform. The company announced Thursday a new program that enables DJs to stream…

Twitch DJs will now have to pay music labels to play songs in livestreams

Apps

Google partners with RapidSOS to enable 911 contact through RCS

Ivan Mehta

2 hours ago

Google said today it is partnering with RapidSOS, a platform for emergency first responders, to enable users to contact 911 through RCS (Rich Messaging Service).

Google partners with RapidSOS to enable 911 contact through RCS

Enterprise

Atlassian now gives startups a year of free access

Frederic Lardinois

2 hours ago

Long before product-led growth became a buzzword, Atlassian offered free tiers for virtually all of its productivity and developer tools. Today, that mostly means free access for up to ten…

Atlassian now gives startups a year of free access

Featured Article

A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Artists have finally had enough with Meta’s predatory AI policies, but Meta’s loss is Cara’s gain. An artist-run, anti-AI social platform, Cara has grown from 40,000 to 650,000 users within the last week, catapulting it to the top of the App Store charts. Instagram is a necessity for many artists,…

Amanda Silberling

3 hours ago

A social app for creatives, Cara grew from 40k to 650k users in a week because artists are fed up with Meta’s AI policies

Google looks to AI to help save the coral reefs

Sarah Perez

3 hours ago

Google has developed a new AI tool to help marine biologists better understand coral reef ecosystems and their health, which can aid in conversation efforts. The tool, SurfPerch, created with…

Google looks to AI to help save the coral reefs

Tektonic AI raises $10M to build GenAI agents for automating business operations

Frederic Lardinois

3 hours ago

Only a few years ago, one of the hottest topics in enterprise software was ‘robotic process automation’ (RPA). It doesn’t feel like those services, which tried to automate a lot…

Tektonic AI raises $10M to build GenAI agents for automating business operations

Space

SpaceX launches mammoth Starship rocket and brings it back for the first time

Aria Alamalhodaei

4 hours ago

SpaceX achieved a key milestone in its Starship flight test campaign: returning the booster and the upper stage back to Earth.

SpaceX launches mammoth Starship rocket and brings it back for the first time

Sirion, now valued around $1B, acquires Eigen as consolidation comes to enterprise AI tooling

Ingrid Lunden

4 hours ago

There’s a lot of buzz about generative AI and what impact it might have on businesses. But look beyond the hype and high-profile deals like the one between OpenAI and…

Sirion, now valued around $1B, acquires Eigen as consolidation comes to enterprise AI tooling

Fintech

Kleiner Perkins leads $14.4M seed round into Fizz, a credit-building debit card aimed at Gen Z college students

Mary Ann Azevedo

4 hours ago

Carlo Kobe and Scott Smith believed so strongly in the need for a debit card product designed specifically for Gen Zers that they dropped out of Harvard and Cornell at…

Kleiner Perkins leads $14.4M seed round into Fizz, a credit-building debit card aimed at Gen Z college students

Climate

How many Earths does your lifestyle require?

Tim De Chant

4 hours ago

A new app called MyGlimpact is intended not only to help people understand their environmental footprint, but why they shouldn’t feel guilty about it.

How many Earths does your lifestyle require?

Biotech & Health

Prolific Machines, with a $55M Series B, shines ‘light’ on a better way to grow lab proteins for food and medicine

Christine Hall

4 hours ago

Prolific Machines believes it has a way of transitioning away from molecules to something better: light.

Prolific Machines, with a $55M Series B, shines ‘light’ on a better way to grow lab proteins for food and medicine

Media & Entertainment

Punk singer Shira Yevin pushes for fair pay with InPink, a women-focused job marketplace

Lauren Forristal

5 hours ago

It’s been 20 years since Shira Yevin, the lead singer of punk band Shiragirl drove a pink RV into the Vans Warped Tour grounds, the now-defunct punk rock festival notorious…

Transportation

Qargo raises $14M to digitize and decarbonize the trucking industry

Mike Butcher

5 hours ago

While the transport industry does use legacy software, many of these platforms are from an earlier era. Qargo hopes its newer technologies can help it leapfrog the competition.

Qargo raises $14M to digitize and decarbonize the trucking industry

Startups

Greptile raises $4M to build an AI-fueled code base expert

Ron Miller

5 hours ago

When you look at how generative AI is being implemented across developer tools, the focus for the most part has been on generating code, as with Github Copilot. Greptile, an…

Greptile raises $4M to build an AI-fueled code base expert

Study finds that AI models hold opposing views on controversial topics

Kyle Wiggers

5 hours ago

The models tended to answer questions inconsistently, which reflects biases embedded in the data used to train the models.

Study finds that AI models hold opposing views on controversial topics

Startups

Cube is building a ‘semantic layer’ for company data

Kyle Wiggers

5 hours ago

A growing number of businesses are embracing data models — abstract models that organize elements of data and standardize how they relate to one another. But as the data analytics…

Cube is building a ‘semantic layer’ for company data

Crypto

Robinhood acquires global crypto exchange Bitstamp for $200M

Paul Sawers

6 hours ago

Stock-trading app Robinhood is diving deeper into the cryptocurrency realm with the acquisition of crypto exchange Bitstamp.

Robinhood acquires global crypto exchange Bitstamp for $200M

Fintech

Fintech Torpago has a unique way to compete with Brex and Ramp: turning banks into customers

Christine Hall

6 hours ago

Torpago’s Powered By product is geared for regional and community banks, with under $20 billion in assets, to launch their own branded cards and spend management programs.

Fintech Torpago has a unique way to compete with Brex and Ramp: turning banks into customers

Startups

Eyebot raised $6M for AI-powered kiosks that provide 90-second vision exams without an optometrist

Marina Temkin

6 hours ago

Over half of Americans wear corrective glasses or contact lenses. While there isn’t a shortage of low-cost and luxury frames available online or in stores, consumers can only buy them…

Eyebot raised $6M for AI-powered kiosks that provide 90-second vision exams without an optometrist

Google’s updated AI-powered NotebookLM expands to India, UK and over 200 other countries

Jagmeet Singh

7 hours ago

Google on Thursday said it is rolling out NotebookLM, its AI-powered note-taking assistant, to over 200 new countries, nearly six months after opening its access in the U.S. The platform,…

Google’s updated AI-powered NotebookLM expands to India, UK and over 200 other countries

Fintech

Starting in war-torn Sudan, YC-backed Elevate now provides fintech to freelancers globally

Tage Kene-Okafor

8 hours ago

Inflation and currency devaluation have always been a growing concern for Africans with bank accounts.

Starting in war-torn Sudan, YC-backed Elevate now provides fintech to freelancers globally

Featured Article

Amazon buys Indian video streaming service MX Player

Amazon has agreed to acquire key assets of Indian video streaming service MX Player from the local media powerhouse Times Internet, the latest step by the e-commerce giant to make its services and brand popular in smaller cities and towns in the key overseas market. The two firms reached a…

Manish Singh

10 hours ago

Anthropic claims its new AI chatbot models beat OpenAI’s GPT-4

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags