AI-Jungle Guide #3: Age of the Agents

Moshi: The Chatbot That Sounds Smarter Than It Is /  Gemma 2: Google's Open Source Overachiever / SillyTavern: Where Your Code Dreams Come True

Welcome back to the AI-Jungle Guide!
In this issue, I'm trying something new: I let a local LLM write the news section. I'll show you the prompt and what the model wrote for each section. Today’s host is Gemma 2 9B by Google[1].

[Me]: Write a 1-2 sentences introduction to the newest issue of the AI-Jungle Guide newsletter. It's about AI agents. make it funny!
[Gemma 2]: Step into the jungle, folks, where the vines are made of code and the monkeys type Shakespeare! This week's AI-Jungle Guide is all about those wisecracking, world-dominating (okay, maybe just helpful) AI agents.  

Totally not Tony Stark being helped by AI-Agents. by leonardo.ai 

News

  • Moshi: The Chatbot That Sounds Smarter Than It Is.
    [Me]: Write a title and a 1-2 sentence paragraph about Moshi:
    - Moshi is a LLM that takes voice input and responds in real time with voice output.
    - model itself is only 7B and rather stupid
    - made by a small team at Kyutai
    - impressive to really talk to AI
    - OpenAI already demoed similar feature based on GPT-4o
    - make it funny!

    [Gemma 2]: Meet Moshi, the 7B parameter LLM that proves even small brains can hold a conversation (sometimes!).  While its knowledge base might be as vast as a goldfish's memory, this voice-activated wonder from Kyutai will chat with you in real time, making it feel like you're talking to a surprisingly eloquent Roomba. [2]

  •  Gemma 2:  Google's Open Source Overachiever.
    [Me]: Write a title and a 1-2 sentence paragraph about Gemma 2:
    - Open source AI model by google
    - Impressive performance at just 9B and 27B parameters.
    - bigger model almost equal to llama 3 70B
    - possible to run locally
    - make it funny!

    [Gemma 2]: Don't let the "just" 9B and 27B parameters fool you, Gemma 2 is Google's open-source AI proving that size isn't everything (sometimes). This surprisingly spry language model can hold its own against bigger siblings, like Llama 3, even if it doesn't have the bragging rights to run your toaster.  [1]

  • SillyTavern: Where Your Code Dreams Come True (And Maybe Go Slightly Off the Rails).
    [Me]: Write a title and a 1-2 sentence paragraph about SillyTavern:
    - A multi agent system initially used for fantasy role-play.
    - Can also be used to simulate a dev team.
    - can use local or cloud model via API
    - define your agents and let AI work for you
    - make it funny

    [Gemma 2]: Forget boring old code sprints! SillyTavern is a multi-agent system that lets you build an AI dev team straight out of your 

    wildest fantasy (or most chaotic coding session). Define your agents, throw them into a virtual tavern, and watch as they argue, collaborate, and hopefully churn out some working code – or at least hilarious debugging logs. [3]

At least moshi knows it’s limits. [2]

Background

  1. What is a multimodal model?
    Moshi, like GPT-4o, is a multimodal model. This means that you can input text, as well as sound/voice or images into the same LLM, and it can also output audio/voice or images.

    There is a difference between architectures with a translation layer, which would label images first and then enter that text information into the main LLM, and true multimodal models that encode text, audio, and images into the same latent space (using the same weights and network to process it).

  2. What are open source models and who made them?
    There is a huge number of open-source LLM models available. Meta made an early move by releasing the Llama models (after they were leaked). Now, all the big players are releasing some models, like Microsoft’s Phi 3, Meta’s Llama 3, Musk’s/X’s Grok, or Gemma 2 by Google. All of these are not the “main” models of their respective companies (except for Llama) but smaller versions. There are also many models made by smaller AI labs and individual researchers that can achieve impressive results.

    These models come in different sizes, given in billions of parameters (like Llama 8B or 70B). 1B translates to roughly 2GB of video RAM use, so manage your expectations of running the big ones at home. You can also use them in different quantizations to save RAM: like Q8 or Q4 (the former is half the size and “resolution” of the full model, the latter one quarter).

  3. What is a multi-agent system and what can I use it for?
    A multi-agent system is a setup where multiple AI agents can interact with each other. This allows you to easily assign different tasks to different agents, enabling more complex interactions than a single prompt can provide.

    For example, it can be as simple as having a writer agent and a researcher agent, where one is dedicated to finding information and the other to summarizing it in a compelling way.

My AI-Development tavern getting ready for action, powered by Gemma 2 [1]

Try it yourself:

  1. Talk to moshi!
    To try Moshi, simply head to moshi.chat and join the queue. Usually, you don’t have to wait long. You need a microphone and speakers, of course, and you must give your browser permission to use them.

    The model is supposed to be released as open source in the future, so one day you might be able to try it on your own device.

  2. Run Gemma 2 on your own PC or Mac!
    An easy way to try Gemma 2, or any open-source model, on your own device is through Ollama. Simply install Ollama on your system (it works on Windows, Mac, and Linux), search for Gemma in the search box on top, or go to [4]. There you will find the instructions to run the desired variant of that model (example: “ollama run gemma2:9b”).

  3. Create a chatroom full of AI-agents!
    Go to sillytavernai.com and follow the instructions to install it locally [5]. SillyTavern works on Windows, Mac, and Linux. You need to set up an AI model, which can be ChatGPT via API or a local model like Gemma 2 (see above) run via Ollama. Then, you need to define a bunch of agents to solve whatever project you’re tackling.

Spread the Word!

Learned something new? Please do me a favour and forward this newsletter to someone else who could benefit from it! Or invite them to subscribe at: https://aijungle.guide!