Cloudflare Powers One-Click-Simple Global Deployment For AI Applications With Hugging Face

Cloudflare has announced that developers can now deploy AI applications on Cloudflare’s global network in one simple click directly from Hugging Face, the leading open and collaborative platform for AI builders. With Workers AI now generally available, Cloudflare is the first serverless inference partner integrated on the Hugging Face Hub for deploying models, enabling developers to quickly, easily, and affordably deploy AI globally, without managing infrastructure or paying for unused compute capacity.

Despite significant strides in AI innovation, there is still a disconnect between its potential and the value it brings businesses. Organisations and their developers need to be able to experiment and iterate quickly and affordably, without having to set up, manage, or maintain GPUs or infrastructure. Businesses are in need of a straightforward platform that unlocks speed, security, performance, observability, and compliance to bring innovative, production-ready applications to their customers faster.

“The recent generative AI boom has companies across industries investing massive amounts of time and money into AI. Some of it will work, but the real challenge of AI is that the demo is easy, but putting it into production is incredibly hard,” said Matthew Prince, CEO and co-founder, Cloudflare. “We can solve this by abstracting away the cost and complexity of building AI-powered apps. Workers AI is one of the most affordable and accessible solutions to run inference. And with Hugging Face and Cloudflare both deeply aligned in our efforts to democratise AI in a simple, affordable way, we’re giving developers the freedom and agility to choose a model and scale their AI apps from zero to global in an instant.”

Workers AI is generally available with GPUs now deployed in more than 150 cities globally

Today, Workers AI is generally available, providing the end-to-end infrastructure needed to scale and deploy AI models efficiently and affordably for the next era of AI applications. Cloudflare now has GPUs deployed across more than 150 cities globally, most recently launching in Cape Town, Durban, Johannesburg, and Lagos for the first locations in Africa, as well as Amman, Buenos Aires, Mexico City, Mumbai, New Delhi, and Seoul, to provide low-latency inference around the world. Workers AI is also expanding to support fine-tuned model weights, enabling organisations to build and deploy more specialised, domain-specific applications.

In addition to Workers AI, Cloudflare’s AI Gateway offers a control plane for your AI applications, allowing developers to dynamically evaluate and route requests to different models and providers, eventually enabling developers to use data to create fine tunes and run the fine-tuned jobs directly on the Workers AI platform.

Cloudflare powers one-click deployment with Hugging Face

With Workers AI generally available, developers can now deploy AI models in one click directly from Hugging Face, for the fastest way to access a variety of models and run inference requests on Cloudflare’s global network of GPUs. Developers can choose one of the popular open source models and then simply click “Deploy to Cloudflare Workers AI” to deploy a model instantly. There are 14 curated Hugging Face models now optimised for Cloudflare’s global serverless inference platform, supporting three different task categories including text generation, embeddings, and sentence similarity.

“We are excited to work with Cloudflare to make AI more accessible to developers,” said Julien Chaumond, co-founder and chief technology officer, Hugging Face. “Offering the most popular open models with a serverless API, powered by a global fleet of GPUs, is an amazing proposition for the Hugging Face community, and I can’t wait to see what they build with it.”

AI-first companies are building with Workers AI

Companies around the world trust Workers AI and Cloudflare’s global network to power their AI applications, including:

  • Talkmaphelps customers uncover and surface real-time conversational intelligence and insights. With millions of customer conversations daily and the need for a fast turnaround for CX & EX outcomes, Cloudflare’s developer platform has helped us keep storage costs and latency low. We’ve selected Cloudflare to help us scale our generative AI service and simplify our runtime architecture so that we can stay focused on adding customer value for conversation insights in the contact centre.” — Jonathan Eisenzopf, Founder and Chief Strategy & Research Officer, Talkmap
  • ChainFusetransforms unstructured data chaos into actionable insights, ensuring every piece of customer feedback, issue, and opportunity is heard and valued. Using products such as Workers AI, AI Gateway, and Vectorize, we have successfully analysed and categorised over 50,000 unique conversations from places like Discord, Discourse, Twitter, G2, and more. Having access to 28 AI models for any task—and swapping them on the fly—allows us to be accurate and efficient at scale.” – George Portillo, co-founder, ChainFuse.com.
  • orgis a modern, open-source discussion platform powering over 20,000 online communities from small hobby groups to forums for some of the largest companies worldwide. Discourse leverages Cloudflare’s Workers AI to run embedding models to power our popular ‘Related Topics’ feature. This produces relevant results within communities, giving community members new opportunities to find and engage with topics they are interested in. Workers AI is currently one of the affordable, open-source ways we can provide Related Topics using a high-performing embeddings model to give our customers an avenue to provide their community members with a new way to discover more relevant content and improve engagement.” – Saif Murtaza, AI Product Manager, Discourse.org
  • Simmerbrings the swiping of dating apps to the recipe and cooking world, to bring couples together over a meal they both enjoy. Simmer has continually adopted Cloudflare products as the platform expands, and Workers AI was no exception; we use Workers AI embeddings and large language models, such as Mistral 7B, to help us create a personalised experience for users on the app, including curated recipes based on preferences. We go to Cloudflare first to explore if their products fit our use case since they’re so easy to work with. Using Cloudflare products also helps us save a lot on costs as we grow our startup.” – Ben Ankiel, CTO, Simmer
  • Audioflareuses AI to convert, examine, condense, and translate brief audio files into various languages. We heavily count on Workers AI for streamlining AI-related tasks including audio file processing, sentiment evaluation, language translation, and maintaining AI’s overall efficiency and dependability. We’re impressed with Cloudflare’s ability to simplify the backend operations of our app. We believe in Cloudflare’s consistent improvements and dedication, and feel confident about growing with their platform.” – Sean Oliver, creator of the open-source LLM repository, Audioflare