Why Do You Need Local AI? Online Models Refresh Scores, But Are Generally Getting Dumber！

VoidHunzi · December 28, 2025, 2:46am

In the previous video, I mentioned that almost all of my workflows are online, but they all have local backup solutions. However, over time, I have used my local solutions less and less. The last time my SSD drive failed, it completely destroyed my local setup. Yet my work was only mildly affected, so I did not mind much. Recently though, two things happened that suddenly made me feel building a local AI solution is still very necessary.

First, although major AI companies keep upgrading their versions and refreshing leaderboards, from my practical application perspective, most large AI models are getting dumber. Yes, the Grok I have used long-term worked stably for a year, and my control scripts hardly had any issues. But recently I found it often inexplicably modifies my content, and without realizing it, I pass the generated results to the AI voice service. Then during editing, I discover the content has been altered, forcing me to redo the entire process. The most typical example is that previously Grok would first search the web for information, then clearly tell me which parts had common-sense errors and what it modified according to my standards. Now it starts judging based on its own knowledge base rather than the latest search results, and it does not tell me what it modified. Only when I return because of errors and scold it does it realize it did something wrong. However, getting angry at AI is meaningless. I can only repeatedly tweak my control scripts, but it is useless because it really has gotten dumber, and it seems to change frequently with more hallucinations than before.

Actually, the tasks I want Grok to help me with are just checking common-sense errors in my manuscripts, translating into English according to my requirements, and outputting my audio generation scripts. Originally, all this could be easily handled by DeepSeek, but its online service often has content censorship and occasionally refuses service. Fortunately, after my efforts, I have managed to adapt reasonably well with the new Grok. I have also tested running quantized versions of 30B or 70B parameter models locally, and I feel they can fully meet my work needs. Moreover, I can work with a fixed version to maintain workflow stability. As a high-frequency content creator like me, I do not need overly powerful AI features; what I need is a stable workflow.

Immediately after that, the Microsoft speech synthesis I often use also developed a serious bug. The audio I generated these past few days has a lot of sentence-breaking errors, and the video you are watching now faces the same crisis. I do not know if Microsoft will discover and fix this issue. If it remains unsolved when I generate audio, I can only manually correct it and release audio with flaws. Logically, AI speech synthesis is a low-technical task, and Microsoft’s pricing is not cheap—at least 5 times more expensive than similar Chinese AI services. I chose foreign AI services precisely because Chinese content censorship felt inconvenient. Moreover, I previously had high-quality local AI speech services. Considering cloud services make the production environment more convenient, I abandoned the free local AI speech that only consumes electricity and chose Microsoft’s.

Honestly, the situation these past two days has made me very frustrated. Perhaps I never should have trusted Microsoft. The Windows 11 they spent huge sums on is now just a Steam launcher for me. Except for gaming and AI image generation, I barely turn on my PC anymore. I have now summarized an important lesson: do not let your workflow be tied to Microsoft. This company cannot provide stable services, and whenever you trust it, you will be bitten back. My PC with 40G memory and 16G dedicated VRAM runs AI large models worse than the M4 Mac mini I bought for over 500 US dollars with 24G memory. Microsoft’s and Nvidia’s joint scam has completely exhausted my patience—crashes always come when you least expect them.

In the past two days, I spent a lot of energy rebuilding my local AI workflow. Its quality is very good, but I cannot switch rashly yet because my audience is already accustomed to Microsoft’s AI voice characters. I need to find a better AI large model that can fully simulate Microsoft’s AI speech tone and allow me to control it with scripts rather than letting AI freestyle. Currently, I have found alternative solutions, but I am too busy to try them yet. I will wait another two days, hoping Microsoft can fix this disgusting feature degradation bug. I also hope that after changing their AI services, they conduct strict testing before release or allow users to lock software versions.

Actually, I have always been a strong supporter of cloud AI services. Except for AI image generation where I think cloud cost-performance is poor, in other areas like writing and coding, I have always believed the free quotas from Grok and Gemini are fully sufficient, while DeepSeek’s paid service is so cheap that I hardly need to consider usage. At the end of 2025 when hardware prices are soaring, I also strongly advise users without urgent needs against bothering with local AI services. However, if you are a producer like me troubled by the instability of online AI services, I think it is still worth researching local AI environment setup.

Let me briefly explain: my main work machine is the 24G memory M4 Mac mini, and I have a backup M1 chip machine with 16G memory. Both can handle my current 4K editing needs very well. The 24G memory Mac mini can smoothly run the 30B Qwen large model Q4 quantized version. The community also provides a Coder version specialized for programming. After my actual testing, it fully meets all my current work demands. However, this model occupies 18G of unified memory, leaving enough for the system, though I still need to run browsers, chat apps, network proxies, and a bunch of other software. When the context length reaches around 35k, its response speed drops significantly. But few people use such huge contexts, and I can close the display interface and all other software to use it as an AI server accessed over the local network while using the backup M1 for video editing. This effectively improves its capabilities. If you are an early buyer of the 32G memory M4, congratulations—you can treat it as an all-purpose workstation.

Due to video length limits, I will share only this much for this episode. In the next one, I will analyze with real cases how to set up a local AI environment, the pros and cons of PC versus Mac, why 30B and 70B models are two sweet-spot sizes, what hardware configurations are needed to run them, what they can do, and more. As a heavy AI dependent, actual producer, and self-media worker who understands programming, I think my experience has reference value for most people. Alright, this episode ends here. See you next time.