My entire locally hosted llm set-up explained f...
TIKTOK

My entire locally hosted llm set-up explained for dummies in 5 minutes. 🤗 #llm #greenscreen #chatgpt #aitips #aicontent #artificialintelligence #automation #ollama hugging face gemma3 ollama3.3 qwen qwq abliterated

3:47 Jun 08, 2025 27,900 879
@prestonrho
663 words
Just a few days ago, I bought the brand new M3 Mac Studio so that I could run the most powerful AI models. Keep in mind, I don't need the craziest of the craziest LLMs. I'm just doing simple business backend operations consulting. First LLM that we have is QN's QWQ 32 billion, best for coding, writing scripts, debugging, and backend ops, great for dev work, and this size fits perfectly. Gemma 3, 27 billion parameter, ideal for writing direct response copywriting, emails, ads, or technical writing, white papers, snappy, and multilingual. This is Google's newest and most powerful LLM. Meta's LLAMA 3.370 build, their state of the art LLM, perfect for polished long form client stuff, proposals, marketing decks, slower, but deep and smooth. 70 billion does fit, but it's gonna run much slower, but it is gonna be much more powerful. We also have QN, Q-W-E-N, my bad, 2.5 coder, lightweight, quick for coding, algorithm, small script, niche, but zippy. For image generation, we have the state of the art Flux One, top pick for fast, versatile image generation, concept art, marketing visuals, no server weight, literally just pure local computer speed. Infinite U by ByteDance, which is TikTok, pretty much, niche for identity preserving images, consistent selfies, portraits, slightly slower, but excels at that. Also using the same four models, but on Hugging Face, the obliterated version of them, pretty much uncensored, you're really getting creative with some of the output. Now, because of my computer size, I am running all of these LLMs at a quantization of four bit. QWQ is gonna run at around 20 to 30 tokens per second. Gemma3, around 20 to 25 tokens per second. Llama, 3.3, 72 billion, like I said, much slower, but the file size, as you can see, is much greater, so that's gonna be around 10 to 15 tokens per second. QN Coder, once again, snappy and fast, a little smaller, 25 to 30 tokens per second. Then we have our image generation models, and then we have our obliterated versions of them. This was given to me from ChachiBT. I think the output should not change. These all should be the same. All of these models, I was able to download locally on Ollama, and then the obliterated models are where users in the active Hugging Face community are taking these open-source LLMs, stripping them of some of its instructions, or fine-tuning them, and then uploading them as well. It's state-of-the-art open-source LLMs that people are fine-tuning to specific use cases, and then uploading, and sharing, and vibing out in. Once I install these locally on my computer, I have a front-end interface that looks exactly like ChachiBT, and it's even better, which is why I love local code OpenWebUI. Similar to ChachiBT's search functions and the different tools that it has, OpenWebUI gives me the ability to visualize data, run code in the actual interface, live search, and a mixture of agents, take multiple LLMs and get them to solve one problem together, chatting with an N8N AI agent workflow within my locally-hosted UI interface. Even cooler, the tools that it has, it has, literally with my LLM, local image generation off of a text prompt, a button I can just click, it'll generate an image from, web scraping, getting the weather, YouTube transcript provider, so all these tools that you are plugging into the front-end interface that's connected to your locally-hosted LLM, and it's using them together. I think that this is the future of AI, especially because OpenAI says GPT-5 will potentially be open-source. MCP is something that everyone is going all in on right now. The ability to host these privately, not break any NDAs, local is going to be the move as soon as they start fitting these locally-hosted LLMs into smaller sizes that all of us have access to. If you wanna learn more about AI inside of your business, follow for more.

No AI insights yet

Save videos. Search everything.

Build your personal library of inspiration. Find any quote, hook, or idea in seconds.

Create Free Account No credit card required
Original