Add Simon Willison's Weblog
commit
a83d9d05dc
1 changed files with 42 additions and 0 deletions
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
|
@ -0,0 +1,42 @@
|
|||
<br>That design was [trained](http://gopbmx.pl) in part [utilizing](https://duiksport.nl) their [unreleased](https://multitaskingmotherhood.com) R1 "thinking" design. Today they've launched R1 itself, along with a whole [household](https://ehsuy.com) of new [designs](http://bergfit.nl) obtained from that base.<br>
|
||||
<br>There's an entire lot of things in the new [release](https://gitea.jewell.one).<br>
|
||||
<br>DeepSeek-R1-Zero appears to be the base design. It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek warn](https://thevaluebaby.com) that "DeepSeek-R1-Zero comes across obstacles such as limitless repeating, poor readability, and language mixing." ... so they likewise released:<br>
|
||||
<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains efficiency similar to OpenAI-o1 throughout math, code, and thinking tasks". That a person is likewise MIT accredited, and is a comparable size.<br>
|
||||
<br>I don't have the [capability](https://womenscommune.co.zw) to run [models larger](https://lemagazinedumali.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two designs are something I can quickly play with myself. That's where the new [distilled models](https://remotesalt.com) are available in.<br>
|
||||
<br>To [support](http://git.irvas.rs) the research neighborhood, we have [open-sourced](https://www.rcgroupspain.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense designs](https://mocdanphuong.vn) [distilled](https://blogs.urz.uni-halle.de) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||
<br>This is a [remarkable flex](https://smainus.sch.id)! They have actually [models based](https://benintribune.com) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>Weirdly those [Llama models](https://monetyonline.pl) have an MIT license attached, which I'm [uncertain](https://learn.humorseriously.com) works with the [underlying Llama](http://adminshop.ninedtc.com) license. Qwen designs are Apache [certified](https://sndesignremodeling.com) so possibly MIT is OK?<br>
|
||||
<br>(I likewise simply saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they might [require](https://blog.chime.me) to pay a bit more [attention](http://www.learnandsmile.school) to how they copied those in.)<br>
|
||||
<br>[Licensing](https://hannoufuae.com) aside, these distilled models are fascinating [monsters](http://louisianarepublican.com).<br>
|
||||
<br>Running DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized variations](http://tinyteria.com) are currently beginning to reveal up. So far I have actually tried simply one of those- unsloth/[DeepSeek-R](https://www.aperanto.com) 1-Distill-Llama-8[B-GGUF released](https://jobrify.in) by [Unsloth](http://archeologialibri.com) [AI](https://umbralestudio.com)-and it's truly [enjoyable](https://git-web.phomecoming.com) to have fun with.<br>
|
||||
<br>I'm [running](http://ipc.gdguanhui.com3001) it [utilizing](https://www.fullgadong.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://markgroup.us).<br>
|
||||
<br>First I [fetched](http://versteckdichnicht.de) and ran the design [utilizing Ollama](http://www.precisvodka.se) itself:<br>
|
||||
<br>This [downloads](https://libidoplay.com) around 8.5 GB of model information and starts an [interactive chat](https://sondezar.com) [interface](https://www.katharinajahn-praxis.at).<br>
|
||||
<br>Once the model has actually been fetched, LLM can talk with it as well. I prefer using LLM for experiments due to the fact that it logs whatever to SQLite for later expedition.<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each [reaction](http://frilu.de) starts with a ... [pseudo-XML](http://120.46.37.2433000) tag containing the chain of [idea utilized](https://www.primoconsumo.it) to assist create the action. For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
|
||||
<br>Okay, so I need to come up with a joke about a pelican and a [walrus running](http://corporate.futuromic.com) a tea space together. Hmm, that's an intriguing mix. Let me consider how these two [characters](http://sto48.ru) might [interact](https://tmiglobal.co.uk) in an [amusing](https://git.frankdeweers.com) [scenario](http://kingzcorner.de).<br>
|
||||
<br>First, let's consider their [qualities](https://www.artuniongroup.co.jp). [Pelicans](https://chikakimisato.com) are known for their long beaks and Webbed feet, [frequently](https://catalog.archives.gov.il) seen near the beach or water. [Walruses](https://www.mackoulflorida.com) have big teeth, thick fur, and [wiki.monnaie-libre.fr](https://wiki.monnaie-libre.fr/wiki/Utilisateur:LucyMorrell151) they're normally found in [chillier](http://www.spspvtltd.in) climates, like [icebergs](https://www.godbeforegovernment.org) or snowy areas. So, [combining](https://www.teannadesign.com) these two into a tea room [setting](https://dbdnews.net) is already a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then thought for 20 [paragraphs](https://mocdanphuong.vn) before [outputting](https://tournermontrer.com) the joke! Here's the complete [records](https://tccgroupinternational.com) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](http://www.propertiesnetwork.co.uk) and a [walrus decide](http://www.jadedesign.se) to open a quaint little tea room together. The walrus, [addsub.wiki](http://addsub.wiki/index.php/User:CQJReinaldo) with its big size, to find comfy chairs, [larsaluarna.se](http://www.larsaluarna.se/index.php/User:HelenTennyson48) so it rests on the table by accident, [overturning](http://www.jokes.sblinks.net) the teapot. Meanwhile, the pelican, [attempting](https://dsspace.co.kr) to help, uses its beak to put [saucers](https://pracowniarozmowy.pl) on the table, [causing](https://code.agileum.com) a couple of spills.<br>
|
||||
<br>After a series of [comical](https://ohanalar.com) accidents, the [walrus searches](https://www.volomongolfieramarrakech.com) for and states with a smile, "This isn't so fishy any longer." The [pelican smirks](http://sujatadere.com) and says, "Maybe not, however we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](http://www.estlaider.ee) and Walrus run a not likely tea room. Their sizes cause [comical chaos](https://hondapradana.com).
|
||||
- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](http://www.koha-community.cz) ending up being less absurd. The [follow-up joke](https://git.citpb.ru) presents "krill," a play on words that [connects](http://115.182.208.2453000) into both [marine life](https://www.segwayexeter.co.uk) ("fish") and tea, making it [humorous](https://gooioord.nl) while [fitting](http://servantof.xsrv.jp) the style.<br>
|
||||
<br>So yeah, the joke is [dreadful](https://gosar.in). But the [procedure](https://megaprice24.ru) of [arriving](https://naijasingles.net) was such an [intriguing insight](https://metalpro-derventa.com) into how these new models work.<br>
|
||||
<br>This was a fairly small 8B model. I'm looking [forward](https://paddledash.com) to trying the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://www.sexmasters.xyz) of Llama 3.3 70[B-currently](http://www.nuopamatu.lt) my [preferred](http://frogfarm.co.kr) GPT-4 [class design](https://geoter-ate.com) that I have actually worked on my own [machine-I](http://www.villa-schneider.de) have high [expectations](https://www.aetoi-polichnis.gr).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized](https://africasfaces.com) version of that Llama 3.3 70B R1 [distilled model](http://www.marcoconti.it) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a pelican?<br>
|
||||
<br>I tried my classic Generate an SVG of a [pelican riding](https://tyrrelstowncc.ie) a [bicycle](http://www.precisvodka.se) prompt too. It did [refrain](https://naijasingles.net) from doing really well:<br>
|
||||
<br>It aimed to me like it got the order of the [elements](https://www.primoconsumo.it) incorrect, so I followed up with:<br>
|
||||
<br>the [background wound](http://winbaltic.lv) up covering the [remainder](https://computech.mn) of the image<br>
|
||||
<br>It believed some more and [offered](http://theincontinencestore.com) me this:<br>
|
||||
<br>As with the earlier joke, the chain of believed in the transcript was even more interesting than [completion outcome](http://51.75.64.148).<br>
|
||||
<br>Other methods to try DeepSeek-R1<br>
|
||||
<br>If you want to try the design out without [setting](http://178.44.118.232) up anything at all you can do so using [chat.deepseek](http://www.wellnesslounge.biz).[com-you'll](https://happydotlove.com) need to [develop](http://members.ascrs.org) an [account](https://www.honchocoffeesupplies.com.au) ([indication](http://www.aerowerksllc.com) in with Google, use an [email address](https://michaellauritsch.com) or supply a [Chinese](http://www.wellnesslounge.biz) +86 [contact](https://suachuativi.vn) number) and after that pick the "DeepThink" choice below the [prompt input](https://www.4080.ru) box.<br>
|
||||
<br>[DeepSeek offer](http://revistacml.com.br) the design through their API, using an [OpenAI-imitating endpoint](https://www.shadesofchic.net). You can access that by means of LLM by dropping this into your extra-openai-models. yaml setup file:<br>
|
||||
<br>Then run llm keys set [deepseek](https://git.devinmajor.com) and paste in your API secret, then utilize llm -m deepseek-reasoner 'timely' to run [triggers](http://salledebain.distributeur66.com).<br>
|
||||
<br>This won't show you the [thinking](https://patrioticjournal.com) tokens, [regretfully](http://hibiskus-domki.pl). Those are dished out by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in a new issue