How open-source AI models can help you take control of your privacy

Post date

November 28, 2024

What are open-source AI models?

With more people becoming aware of how data economy works, earning user’s trust becomes more challenging to the AI providers. To appeal to more conscious users, many AI vendors label their technology open or open-source. But without consensus on what open-source AI is, such open-washing could raise many risks: surveillance, information bias, effects on the labor market and creative industries, and excessive energy usage — to name a few.

In attempt to battle the potentially harmful trend, the Open Source Initiative released its official Open Source Artificial Intelligence Definition (OSAID). According to the OSAID, an Open Source AI is an AI system made available under terms and in a way that grant the freedoms to:

Use the system for any purpose and without having to ask for permission.
Study how the system works and inspect its components.
Modify the system for any purpose, including to change its output.
Share the system for others to use with or without modifications, for any purpose.

Even though the work on this definition has been endorsed by many companies and individuals, there’s no consensus in the community on whether or not to accept it in its current state, or if the open-source AI is even ready to be defined.

We too think there’s room for improvement. But with or without concrete definition for the OS AI, we can tell how four main freedoms of open-source – to use, to study, to modify, and to share – can impact life of the the AI users.

How do users benefit from open-source AI?

Open source promotes freedom and empowers users to control their software – in this case, the AI – they rely on. Here are several examples of opens-source AI systems could directly benefit you as a user:

Proprietary AI can include hidden features like data harvesting for advertising and user profiling, embedded biases, and using your private data to refine the products without your consent. Open-source lets you make sure there are no hidden features or data backdoors – because you (and thousands, or even millions of others) can inspect the source code and understand how the software operates.
If your software is not anymore supported by a provider, you may lose valuable data like your AI’s memory of interactions. And even though normally you can export this data, there’s no guarantee it will be compatible with alternative models. With open-source, you don’t rely on a single vendor – the software will continue to exist, and you will retain full ownership. Moreover, you are not locked in the provider’s ecosystem of applications in order to fully benefit from their AI, no-one controls your costs, your updates and maintenance.
You can host your AI yourself or with a provider you trust, which means you fully control your data, make own decisions for your privacy, and maintain your compliance instead of relying on third-party services.

Nextcloud Ethical AI rating

At Nextcloud, we employ and integrate AI tools ourselves and we realize the risks brought by the evolving computer intelligence. That’s why we created the Ethical AI Rating that helps our users make the right decisions about what AI models to use in their work. It is more granular than the OSAID and more critical in areas like training data availability. For now, we will keep using it as it resonates with the unique needs of Nextcloud users.

Nextcloud Ethical AI Rating is based on three principles:

Whether the training data is available and is free to use
Whether the software (both for inferencing and training) is open-source
Whether the trained model freely available for self-hosting

Based on how many of the criteria are met, a model receives Red 🔴, Orange 🟠, Yellow 🟡 or Green 🟢 rating.

Looking for open-source AI models to try? Consider these!

Nextcloud is not primarily an AI developer or vendor, so we don’t promote any AI products. Instead, we want to help our users choose the right tools to protect their privacy, and where needed, help them start using those tools through Nextcloud Hub. From the perspective of our Ethical AI Rating, here are several AI models that help you take control of your privacy.

Open-source LLMs via LocalAI 🟢

With LocalAI, you can use several open-source models like GPT-Neo or GPT-J, running them locally with ability to modify and interact with them on your server. In this case, the app also gets the Green rating in our Ethical AI rating, fully meeting the criteria.

OpenNMT 🟢

OpenNMT is a fully open-source AI translation framework available under MIT license. In Nextcloud, it can be used via LibreTranslate app that offers translation via your own server/API key.

Opus 🟢

Opus is a machine translation provider by University of Helsinki that runs locally on CPU and doesn’t let any private data leaves your servers. It meets all criteria of our rating.

GPT4All Falcon by Nomic AI 🟢

GPT4All Falcon is an LLM that combines the Falcon language model with the GPT4All interface. It is available under Apache 2.0 license and runs locally on CPU.

There are models that do not match the criteria of the Green rating, but can be worth considering since they are self-hosted and have very permissive licenses, making them mostly safe to run locally:

Self-hosted Stable Diffusion 🟡

Stable Diffusion is open-source model licensed under CreativeML Open RAIL-M License. Users have full access to the model and its code, can modify and redistribute it, given that it’s an on premise version. In Nextcloud, self-hosted Stable Diffusion can be used via LocalAI. We give it a Yellow rating, as the training data is not freely available in this case.

Whisper 🟡

In Nextcloud, you can use Whisper to enable speech-to-text conversion features. Whisper is available under the MIT License and has a Yellow rating due to limited availability of the training data.

Llama 2 🟡

Llama 2 is a free LLM by Meta that can be run on premise, and the software for training and inference of this model is open source. However, the training data is not freely available. It is offered under Llama 2 Community License.

We don’t limit our users’ choice of AI tools, and there are other models and apps that you can integrate in Nextcloud Hub and use with Nextcloud Assistant, including self-hosted models like NeuralBeagle14 7B, Smaug-72B and Meta’s Llama 3. Or famous AI like ChatGPT, DALL·E or Aleph Alpha – which are far from meeting all the criteria of our rating, but you still can easily integrate them in Nextcloud if you wish.

Build your own private AI platform – with Nextcloud Hub

Nextcloud Assistant is the first local AI assistant built into collaboration platform. It is integrated everywhere across Nextcloud Hub, helping you with your daily tasks while taking care of your privacy. Among tools supported by Nextcloud Assistant are chat with AI, text and image generation, email summaries, translation, dictation, inquiring about your own data with Context Chat, automated document creation with Context Write, and much more.

You can integrate AI models of your choice to enable those tools, creating a custom AI-powered environment according to your your privacy and security needs. And if needed, you can host your AI with one of the trusted SaaS providers in Europe, staying fully compliant with privacy regulations like GDPR and others.