A gold rush of NLP startups is on the way – here’s why – Meczyki.Net

remember natural language processing, NLP was born several years ago, but only in 2018 AI researchers proved that it is possible to train a neural network once on a large amount of data and use it repeatedly for different tasks. In 2019 the GPT-2 from Open AI, and the T5 by Google appeared, showing they were surprisingly good (it’s now included in Google Duplex, pictured). Concerns were also raised about their possible abuse.

But since then, things have gone, well, pretty exponential.

2021 saw a ‘Cambrian explosion’ of NLP start-ups and large language models.

This year, Google released Lambda, a major language model for chatbot applications. Then DeepMind released Alpha Code and later Flamingo – a language model capable of visual comprehension. In July of this year alone, Big Science Project Bloom released a massive open source language model and Meta announced that they will train a single language model capable of translating between 200 languages.

We are now approaching a critical point where we will see many more commercial applications of NLP – some using some of these open source, publicly available platforms – come to market. You could almost say that a golden rush of start-ups trying to build on this technology has begun, an arms race developing between the big language model providers.

one of those startups humanloop, a University College AI spinout that claims to make it “significantly” easier for companies to adopt this new wave of NLP technology through a suite of tools that help humans ‘teach’ AI algorithms . This means that a lawyer, doctor or banker can put a piece of knowledge into the platform, which software is then implemented on a large scale, allowing for widespread application of AI across a variety of industries.

It has now been drawn in a $2.6m seed funding round led by Index Ventures, with participation from Y Combinator, Local Globe and Albion.

Founded in 2020 by a team of leading computer scientists from UCL and Cambridge, and alumnus of Google and Amazon, Humanloop’s applications could include building a picture of a national real estate market from unstructured data on the Internet; reading through electronic health records to identify people who may be candidates to try new treatments; And even moderating comments on Facebook groups.

“People will be shocked if they know what language-based AI is now capable of,” CEO Raza Habib said in a statement. “But getting the data into a form that algorithms can use is the biggest challenge. With Humanloop, we want to democratize access to AI and enable the next generation of intelligent, self-service applications – By allowing any company to take its domain expertise and deliver it efficiently in a machine learning model.”

HumanLoop claims that its success is the growth of ‘probabilistic deep learning’, where algorithms can work out what they don’t know, by tuning the noise in data sets, finding the good stuff and asking humans for help for the parts they do. ask. t understand.

Other start-ups build their own large language models and put them behind APIs: Coherence AI ($164.9 million in funding) and Open AI GPT-3. snorkel AI (funding of $135.3M) is also a new startup in this area.

However, Humanloop says it is less focused on developing the models and more on the tools needed to adapt them to specific use cases.

Erin Price-Wright, Partner at Index Ventures, says, “What many people don’t realize is that it isn’t a lack of appropriate algorithms that is stopping AI from becoming ubiquitous in every workplace – it’s the absence of properly labeled data.” ” Investment. “Indeed, machine learning itself is becoming increasingly commoditized and off-the-shelf, but it’s really hard for non-technical people to pass their knowledge on to the machine and help algorithms refine its models.” Is.” Hence why Humanloop allows people to tweak the data.

If the NLP gold-rush is indeed on its way, expect a whole bunch of other startups to appear soon…