Add Simon Willison's Weblog
commit
a10f07f78f
1 changed files with 42 additions and 0 deletions
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
|
@ -0,0 +1,42 @@
|
||||||
|
<br>That design was trained in part using their unreleased R1 "thinking" model. Today they have actually [released](https://izkulis.ru) R1 itself, together with a whole family of new models obtained from that base.<br>
|
||||||
|
<br>There's a great deal of things in the new release.<br>
|
||||||
|
<br>DeepSeek-R1[-Zero appears](https://www.mfustvarjalnica.com) to be the [base model](https://dirtywordcustomz.com). It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. [DeepSeek warn](https://www.lifechange.at) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repetition, poor readability, and language blending." ... so they also released:<br>
|
||||||
|
<br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That one is also MIT accredited, and is a similar size.<br>
|
||||||
|
<br>I do not have the [ability](https://argotravel.ge) to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](http://communikationsclownsev.apps-1and1.net) are something I can easily have fun with myself. That's where the new [distilled designs](https://www.genialspanish.com.ar) are available in.<br>
|
||||||
|
<br>To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 dense models [distilled](https://theyolofiedmonkey.com) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||||
|
<br>This is an interesting flex! They have models based on Qwen 2.5 (14B, [oke.zone](https://oke.zone/profile.php?id=302493) 32B, Math 1.5 B and Math 7B) and [oke.zone](https://oke.zone/profile.php?id=311182) Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>[Weirdly](https://webshop.devuurscheschaapskooi.nl) those Llama models have an MIT license attached, which I'm uncertain works with the [underlying Llama](http://www.tashiro-s.com) license. Qwen models are Apache certified so maybe MIT is OK?<br>
|
||||||
|
<br>(I likewise simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more attention to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](https://www.s-ling.com) aside, these distilled designs are interesting monsters.<br>
|
||||||
|
<br>[Running](https://odishahaat.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br>[Quantized versions](https://www.ejobsboard.com) are currently beginning to reveal up. So far I've tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF launched by [Unsloth](https://academy.nandrex.com) [AI](https://livejagat.com)-and it's really fun to have fun with.<br>
|
||||||
|
<br>I'm running it [utilizing](https://eng.worthword.com) the [combination](https://git.satori.love) of Ollama, LLM and the llm-ollama plugin.<br>
|
||||||
|
<br>First I brought and [dokuwiki.stream](https://dokuwiki.stream/wiki/User:HNXVictoria) ran the model using Ollama itself:<br>
|
||||||
|
<br>This [downloads](http://paris4training.com) around 8.5 GB of model information and begins an [interactive chat](https://sujansadhu.com) user [interface](http://cami-halisi.com).<br>
|
||||||
|
<br>Once the design has been brought, LLM can speak to it as well. I choose using LLM for [experiments](https://mznoticia.com.br) because it logs whatever to SQLite for later [exploration](https://git.vg.tools).<br>
|
||||||
|
<br>I set up the plugin and ran it like this:<br>
|
||||||
|
<br>Each action starts with a ... [pseudo-XML](http://www.hirlevel.wawona.hu) tag containing the chain of idea utilized to [assist generate](https://yuri-needlework.com) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:<br>
|
||||||
|
<br>Okay, [astroberry.io](https://www.astroberry.io/docs/index.php?title=User:GraceBuntine79) so I [require](http://47.119.27.838003) to come up with a joke about a pelican and a [walrus running](https://findgovtsjob.com) a [tea space](https://ellin.ch) together. Hmm, that's an interesting mix. Let me believe about how these 2 [characters](https://www.valentinourologo.it) may connect in an amusing scenario.<br>
|
||||||
|
<br>First, let's consider their [characteristics](https://marloesijpelaar.nl). [Pelicans](https://routingtable.cloud) are known for their long beaks and Webbed feet, frequently seen near the beach or water. [Walruses](https://ghaithsalih.com) have huge teeth, thick fur, and they're generally found in cooler environments, like icebergs or [snowy locations](http://www.profecogest.fr). So, [combining](https://www.cataplum.cl) these two into a tea room [setting](https://angiesstays.com) is currently a funny image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then thought for 20 [paragraphs](https://git.frugt.org) before [outputting](http://guestbook.charliechaplin-vom-riekenhof.de) the joke! Here's the complete [records](https://www.velabattery.com) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](https://laalegriadevivirsinadicciones.com) and [garagesale.es](https://www.garagesale.es/author/seanedouard/) a [walrus decide](https://git.k8sutv.it.ntnu.no) to open a quaint little tea room together. The walrus, with its big size, has a hard time to find comfy chairs, [fraternityofshadows.com](https://fraternityofshadows.com/wiki/User:EdwardAppel31) so it rests on the table by mishap, [overturning](http://evelinekaeshammer.ch) the teapot. Meanwhile, the pelican, attempting to assist, [utilizes](https://www.dspp.com.ar) its beak to place dishes on the table, a couple of spills.<br>
|
||||||
|
<br>After a series of comical accidents, the walrus searches for and says with a grin, "This isn't so fishy any longer." The [pelican](https://tadomalkine.eu) smirks and remarks, "Maybe not, but we do have a great deal of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](https://peaceclinicpty.com) and Walrus run a not likely [tea space](http://sunnywear.ru). Their sizes cause comical turmoil.
|
||||||
|
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://foke.chat) ending up being less [ridiculous](http://gitlab.gavelinfo.com). The [follow-up joke](https://shop.hovala.co.il) presents "krill," a play on words that ties into both [marine life](https://seuspazio.com.br) ("fish") and tea, making it funny while [fitting](https://www.agaproduction.com) the style.<br>
|
||||||
|
<br>So yeah, the joke is [horrible](https://www.rhcapital.cl). But the [process](http://loreephotography.com) of getting there was such an interesting [insight](https://pierceheatingandair.com) into how these new models work.<br>
|
||||||
|
<br>This was a fairly little 8B design. I'm eagerly anticipating [checking](http://yosoy.squarespace.com) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70[B-currently](https://thewayibrew.com) my favourite GPT-4 [class model](https://dalco.be) that I have actually run on my own [machine-I](https://genki-art.com) have high [expectations](https://www.nekoramen.fr).<br>
|
||||||
|
<br>Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 distilled design working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a [pelican](https://livejagat.com)?<br>
|
||||||
|
<br>I tried my timeless Generate an SVG of a pelican riding a bike prompt too. It did [refrain](https://psiindonesia.co.id) from doing really well:<br>
|
||||||
|
<br>It aimed to me like it got the order of the [aspects](https://www.al-menasa.net) incorrect, so I followed up with:<br>
|
||||||
|
<br>the background ended up [covering](https://www.lifechange.at) the remainder of the image<br>
|
||||||
|
<br>It believed some more and gave me this:<br>
|
||||||
|
<br>Just like the earlier joke, the chain of thought in the records was even more intriguing than the end [outcome](https://embassymalawi.be).<br>
|
||||||
|
<br>Other ways to try DeepSeek-R1<br>
|
||||||
|
<br>If you wish to attempt the design out without installing anything at all you can do so using chat.deepseek.com-you'll require to develop an account (check in with Google, utilize an [email address](http://glass-n.work) or offer a Chinese +86 [telephone](https://picturesbyronky.com) number) and then pick the "DeepThink" [choice listed](https://www.fluencycheck.com) below the [timely input](https://www.forextradingnomad.com) box.<br>
|
||||||
|
<br>[DeepSeek provide](https://argotravel.ge) the model via their API, using an [OpenAI-imitating endpoint](http://taxbox.ae). You can access that via LLM by dropping this into your [extra-openai-models](https://tracklisting.mxtthxw.art). [yaml configuration](https://livejagat.com) file:<br>
|
||||||
|
<br>Then run llm [secrets](https://natloyola.com) set [deepseek](https://www.mtreellc.com) and paste in your API key, [galgbtqhistoryproject.org](https://galgbtqhistoryproject.org/wiki/index.php/User:ReynaldoMcInnes) then use llm -m deepseek-reasoner 'timely' to run prompts.<br>
|
||||||
|
<br>This will not reveal you the thinking tokens, [regretfully](https://thewayibrew.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in a new issue