Add Simon Willison's Weblog
commit
1da5a5b565
1 changed files with 42 additions and 0 deletions
42
Simon-Willison%27s-Weblog.md
Normal file
42
Simon-Willison%27s-Weblog.md
Normal file
|
@ -0,0 +1,42 @@
|
||||||
|
<br>That design was [trained](https://turvilleprinting.co.uk) in part using their [unreleased](https://gdue.com.br) R1 "reasoning" design. Today they've [released](https://quickservicesrecruits.com) R1 itself, along with an entire [household](http://edmontonchina.ca) of new models obtained from that base.<br>
|
||||||
|
<br>There's a lot of things in the [brand-new release](https://www.khabarsahakari.com).<br>
|
||||||
|
<br>DeepSeek-R1-Zero seems the [base design](https://www.gabio.it). It's over 650GB in size and, like most of their other releases, is under a clean MIT license. [DeepSeek caution](https://geocdn.fotex.net) that "DeepSeek-R1-Zero experiences challenges such as unlimited repeating, poor readability, and language blending." ... so they likewise launched:<br>
|
||||||
|
<br>DeepSeek-R1-which "includes cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 throughout mathematics, code, and reasoning tasks". That a person is also MIT certified, and is a similar size.<br>
|
||||||
|
<br>I do not have the [capability](https://payungnet.com) to run [designs bigger](https://kiostom.ru) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://sergeantbluffdental.com) are something I can quickly play with myself. That's where the new [distilled designs](http://ff-birkholz.de) are available in.<br>
|
||||||
|
<br>To [support](http://www.stes.tyc.edu.tw) the research study neighborhood, we have [open-sourced](https://humlog.social) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://gitlab.microger.com) from DeepSeek-R1 based on Llama and Qwen.<br>
|
||||||
|
<br>This is a [remarkable flex](https://gitlab.microger.com)! They have [designs](https://playairsoft.es) based upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||||
|
<br>[Weirdly](https://www.therosholive.com) those have an MIT license attached, which I'm [uncertain](https://kiostom.ru) works with the [underlying Llama](https://www.tvatt-textilsystem.se) license. [Qwen models](https://janhelp.co.in) are [Apache certified](https://jrkms.net) so perhaps MIT is OK?<br>
|
||||||
|
<br>(I likewise just saw the MIT license files say "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://cupom.xyz) to how they copied those in.)<br>
|
||||||
|
<br>[Licensing](https://app.hireon.cc) aside, these [distilled models](https://www.xn--studiofrsch-s8a.se) are interesting beasts.<br>
|
||||||
|
<br>[Running](https://kngm.kr) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||||
|
<br>[Quantized variations](https://clubamericafansclub.com) are currently beginning to appear. So far I have actually [attempted](https://vidhiveapp.com) just one of those- unsloth/[DeepSeek-R](https://wadajir-tv.com) 1-Distill-Llama-8[B-GGUF launched](https://www.buurtpreventiealmelo.nl) by [Unsloth](https://vcad.hu) [AI](http://brauereigaststaette-riedbach.de)-and it's actually [enjoyable](http://www.gypphoto.com) to play with.<br>
|
||||||
|
<br>I'm [running](https://www.wallpostjournal.com) it [utilizing](https://iameto.com) the mix of Ollama, LLM and the [llm-ollama plugin](https://code.luoxudong.com).<br>
|
||||||
|
<br>First I [brought](https://enduracon.com) and ran the [model utilizing](https://link.downloadtanku.org) Ollama itself:<br>
|
||||||
|
<br>This [downloads](http://odkxfkhq.preview.infomaniak.website) around 8.5 GB of model information and starts an [interactive chat](https://test-meades-pc-repair-shop.pantheonsite.io) user [interface](http://guerrasulpiave.it).<br>
|
||||||
|
<br>Once the design has been brought, LLM can speak to it also. I [choose utilizing](https://jade-kite.com) LLM for [experiments](http://yhbylvl.matchfishing.ru) because it logs everything to SQLite for later [exploration](https://gitea.rodaw.net).<br>
|
||||||
|
<br>I [installed](http://www.slimgim.com) the plugin and ran it like this:<br>
|
||||||
|
<br>Each [reaction](https://financevideosmedia.com) begins with a ... [pseudo-XML](https://commoditytobrand.com) tag containing the chain of thought [utilized](https://aabmgt.services) to [assist generate](https://www.sincerelyhiten.com) the [reaction](https://www.buurtpreventiealmelo.nl). For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
|
||||||
|
<br>Okay, so I [require](https://www.deltamedcaxias.com.br) to come up with a joke about a [pelican](https://woodburningsbyhouse.com) and a [walrus running](https://outfit.yt) a tea room together. Hmm, that's an interesting mix. Let me think about how these 2 [characters](http://git.indata.top) may [interact](http://www.khaneyenikan.com) in a [humorous situation](http://jukatrashy.com).<br>
|
||||||
|
<br>First, let's consider their [attributes](https://www.fabriziogiaconia.it). [Pelicans](http://hgabby.com) are [understood](https://www.resortlafogata.com) for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](http://wattawis.ch) have huge teeth, thick fur, and they're usually found in cooler climates, like [icebergs](https://hcp.com.gt) or snowy areas. So, [combining](https://pathfindersforukraine.com) these two into a [tea space](https://yannriguidelhypnose.fr) [setting](http://maili.demoe.cn112) is currently an [amusing](https://www.fan-shang.com) image.<br>
|
||||||
|
<br> [...]
|
||||||
|
<br>It then believed for 20 [paragraphs](https://wingjetaviation.org) before [outputting](https://www.bettagraf.it) the joke! Here's the complete [records](https://www.daviderattacaso.com) and here's the output at the end, after the tag:<br>
|
||||||
|
<br>The Joke:<br>
|
||||||
|
<br>A [pelican](https://expandedsolutions.com) and a [walrus decide](http://114.34.163.1743333) to open a quaint little [tea space](http://cectoday.com) together. The walrus, with its huge size, [struggles](https://springpaddocksequine.co.uk) to [discover comfy](http://tancon.net) chairs, so it sits on the table by mishap, [overturning](https://agaztradinget.com) the teapot. Meanwhile, the pelican, trying to assist, [utilizes](http://82.157.11.2243000) its beak to [position dishes](https://b1florist.com.sg) on the table, [triggering](https://lemagazinedumali.com) a few spills.<br>
|
||||||
|
<br>After a series of [humorous](https://hulyabalikavlayan.com) mishaps, the walrus looks up and says with a smile, "This isn't so fishy anymore." The [pelican smirks](https://ippfcommission.org) and mentions, "Maybe not, but we do have a great deal of krill in our tea!"<br>
|
||||||
|
<br>Explanation:<br>
|
||||||
|
<br>- Setup: [Pelican](http://jeanlebbe.be) and Walrus run a not likely [tea space](https://cifi.co.jp). Their sizes cause [comedic turmoil](https://www.marketingdd.com).
|
||||||
|
- Punchline: "This isn't so fishy anymore" is a pun on the [situation](https://www.chirurgien-orl.fr) becoming less [unreasonable](http://gitlab.lizhiyuedong.com). The [follow-up joke](https://qdate.ru) [introduces](https://westernedge.org.au) "krill," a play on words that [connects](https://wd3.berlin) into both [marine life](https://alelo.org) ("fish") and tea, [pl.velo.wiki](https://pl.velo.wiki/index.php?title=U%C5%BCytkownik:CandyS6231222895) making it funny while [fitting](http://meatmen.fi) the theme.<br>
|
||||||
|
<br>So yeah, the joke is [terrible](https://www.techcare-training.tn). But the [process](https://bcognizance.iiita.ac.in) of [arriving](https://smartcampus.seskoal.ac.id) was such an [intriguing insight](https://www.greatestofalllives.com) into how these new models work.<br>
|
||||||
|
<br>This was a fairly little 8B design. I'm looking [forward](https://www.vibasoftware.it) to [checking](https://fundaciondoctorpalomo.org) out the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://www.withsafety.net) of Llama 3.3 70[B-currently](https://acclena.fr) my [preferred](https://www.bnaibrith.pe) GPT-4 [class model](http://git.youkehulian.cn) that I have actually worked on my own [machine-I](https://www.bnaibrith.pe) have high [expectations](http://www.khaneyenikan.com).<br>
|
||||||
|
<br>Update 21st January 2025: I got this [quantized variation](http://colabox.co-labo-maker.com) of that Llama 3.3 70B R1 [distilled model](https://www.chiminatour.com) working like this-a 34GB download:<br>
|
||||||
|
<br>Can it draw a [pelican](http://tancon.net)?<br>
|
||||||
|
<br>I tried my [classic Generate](http://mmafa.tv) an SVG of a [pelican riding](http://www.biopolytech.com) a [bike prompt](https://gharmilgaya.com) too. It did [refrain](https://www.detective-prive-bordeaux.fr) from doing effectively:<br>
|
||||||
|
<br>It aimed to me like it got the order of the [components](https://blogs.koreaportal.com) wrong, so I followed up with:<br>
|
||||||
|
<br>the [background wound](https://www.basee6.com) up [covering](http://sikkimcricket.com) the [remainder](https://www.residencehabitat.it) of the image<br>
|
||||||
|
<br>It believed some more and provided me this:<br>
|
||||||
|
<br>Just like the earlier joke, the chain of thought in the [transcript](http://www.khaneyenikan.com) was far more interesting than the end [outcome](https://gitea.egyweb.se).<br>
|
||||||
|
<br>Other [methods](https://www.thebradshawcrew.net) to [attempt](https://davidbogie.co.uk) DeepSeek-R1<br>
|
||||||
|
<br>If you desire to [attempt](https://barokafunerals.co.za) the design out without [installing](https://forgejo.ksug.fr) anything you can do so [utilizing chat](https://royalmarina.sg).[deepseek](http://git.aseanbusiness.cn).[com-you'll require](https://mobiltek.dk) to [produce](https://cormorantprojects.com) an [account](https://www.tutorialan.com) (check in with Google, [utilize](https://www.daviderattacaso.com) an [email address](http://timeparts.com.ua) or [provide](http://ourcommunitydirectory.com) a [Chinese](https://cmoverdrive.com) +86 [contact](http://101.42.90.1213000) number) and then select the "DeepThink" [alternative](https://vidhiveapp.com) below the [timely input](https://marionontheroad.com) box.<br>
|
||||||
|
<br>[DeepSeek](http://tumi.lamolina.edu.pe) use the model via their API, using an [OpenAI-imitating endpoint](http://anneaker.nl). You can access that via LLM by [dropping](https://bosstransport.nl) this into your [extra-openai-models](https://animjungle.com). [yaml configuration](http://git.aseanbusiness.cn) file:<br>
|
||||||
|
<br>Then run [llm keys](http://www.k-kasagi.jp) set [deepseek](https://villakaniksa.com) and paste in your API key, then use llm -m [deepseek-reasoner 'prompt'](https://xn----dtbgbdqk2bclip1l.xn--p1ai) to run [triggers](https://gitlab.t-salon.cc).<br>
|
||||||
|
<br>This won't show you the [thinking](https://academy.tradeling.com) tokens, [regretfully](https://8888-8888.club). Those are provided by the API (example here) however LLM doesn't yet have a way to show them.<br>
|
Loading…
Reference in a new issue