commit a10f07f78f6d252499e78aa85d32a9105b0fba46 Author: mckinleygraynd Date: Mon Feb 10 12:31:49 2025 +0100 Add Simon Willison's Weblog diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..17008b7 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That design was trained in part using their unreleased R1 "thinking" model. Today they have actually [released](https://izkulis.ru) R1 itself, together with a whole family of new models obtained from that base.
+
There's a great deal of things in the new release.
+
DeepSeek-R1[-Zero appears](https://www.mfustvarjalnica.com) to be the [base model](https://dirtywordcustomz.com). It's over 650GB in size and, like the majority of their other releases, is under a tidy MIT license. [DeepSeek warn](https://www.lifechange.at) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repetition, poor readability, and language blending." ... so they also released:
+
DeepSeek-R1-which "integrates cold-start information before RL" and "attains efficiency similar to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That one is also MIT accredited, and is a similar size.
+
I do not have the [ability](https://argotravel.ge) to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](http://communikationsclownsev.apps-1and1.net) are something I can easily have fun with myself. That's where the new [distilled designs](https://www.genialspanish.com.ar) are available in.
+
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and 6 dense models [distilled](https://theyolofiedmonkey.com) from DeepSeek-R1 based on Llama and Qwen.
+
This is an interesting flex! They have models based on Qwen 2.5 (14B, [oke.zone](https://oke.zone/profile.php?id=302493) 32B, Math 1.5 B and Math 7B) and [oke.zone](https://oke.zone/profile.php?id=311182) Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://webshop.devuurscheschaapskooi.nl) those Llama models have an MIT license attached, which I'm uncertain works with the [underlying Llama](http://www.tashiro-s.com) license. Qwen models are Apache certified so maybe MIT is OK?
+
(I likewise simply saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more attention to how they copied those in.)
+
[Licensing](https://www.s-ling.com) aside, these distilled designs are interesting monsters.
+
[Running](https://odishahaat.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized versions](https://www.ejobsboard.com) are currently beginning to reveal up. So far I've tried simply among those- unsloth/DeepSeek-R 1-Distill-Llama-8B-GGUF launched by [Unsloth](https://academy.nandrex.com) [AI](https://livejagat.com)-and it's really fun to have fun with.
+
I'm running it [utilizing](https://eng.worthword.com) the [combination](https://git.satori.love) of Ollama, LLM and the llm-ollama plugin.
+
First I brought and [dokuwiki.stream](https://dokuwiki.stream/wiki/User:HNXVictoria) ran the model using Ollama itself:
+
This [downloads](http://paris4training.com) around 8.5 GB of model information and begins an [interactive chat](https://sujansadhu.com) user [interface](http://cami-halisi.com).
+
Once the design has been brought, LLM can speak to it as well. I choose using LLM for [experiments](https://mznoticia.com.br) because it logs whatever to SQLite for later [exploration](https://git.vg.tools).
+
I set up the plugin and ran it like this:
+
Each action starts with a ... [pseudo-XML](http://www.hirlevel.wawona.hu) tag containing the chain of idea utilized to [assist generate](https://yuri-needlework.com) the action. For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, [astroberry.io](https://www.astroberry.io/docs/index.php?title=User:GraceBuntine79) so I [require](http://47.119.27.838003) to come up with a joke about a pelican and a [walrus running](https://findgovtsjob.com) a [tea space](https://ellin.ch) together. Hmm, that's an interesting mix. Let me believe about how these 2 [characters](https://www.valentinourologo.it) may connect in an amusing scenario.
+
First, let's consider their [characteristics](https://marloesijpelaar.nl). [Pelicans](https://routingtable.cloud) are known for their long beaks and Webbed feet, frequently seen near the beach or water. [Walruses](https://ghaithsalih.com) have huge teeth, thick fur, and they're generally found in cooler environments, like icebergs or [snowy locations](http://www.profecogest.fr). So, [combining](https://www.cataplum.cl) these two into a tea room [setting](https://angiesstays.com) is currently a funny image.
+
[...] +
It then thought for 20 [paragraphs](https://git.frugt.org) before [outputting](http://guestbook.charliechaplin-vom-riekenhof.de) the joke! Here's the complete [records](https://www.velabattery.com) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://laalegriadevivirsinadicciones.com) and [garagesale.es](https://www.garagesale.es/author/seanedouard/) a [walrus decide](https://git.k8sutv.it.ntnu.no) to open a quaint little tea room together. The walrus, with its big size, has a hard time to find comfy chairs, [fraternityofshadows.com](https://fraternityofshadows.com/wiki/User:EdwardAppel31) so it rests on the table by mishap, [overturning](http://evelinekaeshammer.ch) the teapot. Meanwhile, the pelican, attempting to assist, [utilizes](https://www.dspp.com.ar) its beak to place dishes on the table, a couple of spills.
+
After a series of comical accidents, the walrus searches for and says with a grin, "This isn't so fishy any longer." The [pelican](https://tadomalkine.eu) smirks and remarks, "Maybe not, but we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://peaceclinicpty.com) and Walrus run a not likely [tea space](http://sunnywear.ru). Their sizes cause comical turmoil. +- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://foke.chat) ending up being less [ridiculous](http://gitlab.gavelinfo.com). The [follow-up joke](https://shop.hovala.co.il) presents "krill," a play on words that ties into both [marine life](https://seuspazio.com.br) ("fish") and tea, making it funny while [fitting](https://www.agaproduction.com) the style.
+
So yeah, the joke is [horrible](https://www.rhcapital.cl). But the [process](http://loreephotography.com) of getting there was such an interesting [insight](https://pierceheatingandair.com) into how these new models work.
+
This was a fairly little 8B design. I'm eagerly anticipating [checking](http://yosoy.squarespace.com) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the strength of Llama 3.3 70[B-currently](https://thewayibrew.com) my favourite GPT-4 [class model](https://dalco.be) that I have actually run on my own [machine-I](https://genki-art.com) have high [expectations](https://www.nekoramen.fr).
+
Update 21st January 2025: I got this quantized version of that Llama 3.3 70B R1 distilled design working like this-a 34GB download:
+
Can it draw a [pelican](https://livejagat.com)?
+
I tried my timeless Generate an SVG of a pelican riding a bike prompt too. It did [refrain](https://psiindonesia.co.id) from doing really well:
+
It aimed to me like it got the order of the [aspects](https://www.al-menasa.net) incorrect, so I followed up with:
+
the background ended up [covering](https://www.lifechange.at) the remainder of the image
+
It believed some more and gave me this:
+
Just like the earlier joke, the chain of thought in the records was even more intriguing than the end [outcome](https://embassymalawi.be).
+
Other ways to try DeepSeek-R1
+
If you wish to attempt the design out without installing anything at all you can do so using chat.deepseek.com-you'll require to develop an account (check in with Google, utilize an [email address](http://glass-n.work) or offer a Chinese +86 [telephone](https://picturesbyronky.com) number) and then pick the "DeepThink" [choice listed](https://www.fluencycheck.com) below the [timely input](https://www.forextradingnomad.com) box.
+
[DeepSeek provide](https://argotravel.ge) the model via their API, using an [OpenAI-imitating endpoint](http://taxbox.ae). You can access that via LLM by dropping this into your [extra-openai-models](https://tracklisting.mxtthxw.art). [yaml configuration](https://livejagat.com) file:
+
Then run llm [secrets](https://natloyola.com) set [deepseek](https://www.mtreellc.com) and paste in your API key, [galgbtqhistoryproject.org](https://galgbtqhistoryproject.org/wiki/index.php/User:ReynaldoMcInnes) then use llm -m deepseek-reasoner 'timely' to run prompts.
+
This will not reveal you the thinking tokens, [regretfully](https://thewayibrew.com). Those are dished out by the API (example here) but LLM doesn't yet have a way to show them.
\ No newline at end of file