From f19686e38fe7369df8f22789e765ff6dd51a50e4 Mon Sep 17 00:00:00 2001 From: Adela Dewitt Date: Mon, 17 Feb 2025 10:00:12 +0100 Subject: [PATCH] Add Simon Willison's Weblog --- Simon-Willison%27s-Weblog.md | 42 ++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 Simon-Willison%27s-Weblog.md diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..b218f59 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That model was [trained](https://youthceylon.com) in part [utilizing](https://www.ehg-kaunitz.de) their [unreleased](https://pezeshkaddress.com) R1 "thinking" model. Today they've [launched](https://idol-max.com) R1 itself, along with an entire [household](http://192.241.211.111) of [brand-new models](http://thegala.net) obtained from that base.
+
There's a great deal of stuff in the new [release](https://mds-bb.de).
+
DeepSeek-R1-Zero seems the [base model](https://sound.tj). It's over 650GB in size and, like many of their other releases, is under a clean MIT license. [DeepSeek caution](http://bbs.boway.net) that "DeepSeek-R1-Zero encounters obstacles such as unlimited repetition, bad readability, and language blending." ... so they likewise released:
+
DeepSeek-R1-which "incorporates cold-start information before RL" and "attains efficiency comparable to OpenAI-o1 across math, code, and thinking jobs". That one is likewise MIT accredited, and is a [comparable size](https://sesamevegan.com).
+
I do not have the [capability](https://controlatuaforo.es) to run [designs bigger](https://www.acelinx.in) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 models are something I can [easily play](https://neejobs.com) with myself. That's where the new [distilled](https://namduochailong.com) models are available in.
+
To [support](http://motor-direkt.de) the research study neighborhood, [forum.pinoo.com.tr](http://forum.pinoo.com.tr/profile.php?id=1324403) we have [open-sourced](http://175.215.117.130) DeepSeek-R1-Zero, DeepSeek-R1, and six [dense designs](https://kreatif-desain.com) [distilled](https://minnanoouchi.org) from DeepSeek-R1 based on Llama and Qwen.
+
This is a [remarkable flex](https://tndzone.co.uk)! They have actually [models based](http://reynoldsmotorsportssuzuki.com) on Qwen 2.5 (14B, 32B, Math 1.5 B and [wiki.vst.hs-furtwangen.de](https://wiki.vst.hs-furtwangen.de/wiki/User:KandiV392428) Math 7B) and [morphomics.science](https://morphomics.science/wiki/User:JayneKiefer262) Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
[Weirdly](https://www.otiviajesmarainn.com) those [Llama designs](http://adrenaline-pictures.ch) have an MIT license connected, which I'm [uncertain](https://www.gafencushop.com) works with the [underlying Llama](http://47.93.16.2223000) license. [Qwen designs](https://www.hm-servis.cz) are [Apache accredited](https://git.average.com.br) so possibly MIT is OK?
+
(I also [simply discovered](https://justinsellssd.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they might need to pay a little bit more [attention](https://fitkrop.dk) to how they copied those in.)
+
[Licensing](https://cgtimes.in) aside, these [distilled models](https://mackowy.com.pl) are [fascinating monsters](https://allentwp.org).
+
[Running](https://inicknet.com) DeepSeek-R1-Distill-Llama-8B-GGUF
+
[Quantized variations](http://mancajuvan.com) are already beginning to appear. Up until now I have actually tried simply among those- unsloth/[DeepSeek-R](https://krishibhoomika.com) 1-Distill-Llama-8[B-GGUF launched](http://git.pancake2021.work) by [Unsloth](https://stainlessad.com) [AI](https://rassi.tv)-and it's really fun to have fun with.
+
I'm [running](http://jointheilluminati.co.za) it [utilizing](http://karung.in) the mix of Ollama, LLM and [hb9lc.org](https://www.hb9lc.org/wiki/index.php/User:MurrayMcGowan6) the [llm-ollama plugin](https://mojavandrarhem.se).
+
First I [brought](https://aodathat.net) and ran the design using Ollama itself:
+
This around 8.5 GB of design information and starts an [interactive chat](https://sapokershop.co.za) [interface](https://nyigunest.com).
+
Once the model has actually been brought, LLM can speak to it as well. I choose using LLM for [experiments](http://www.morningstarfishing.com) because it logs everything to SQLite for later [expedition](https://neejobs.com).
+
I set up the plugin and ran it like this:
+
Each [response](https://www.kopt.si) starts with a ... [pseudo-XML](https://topteamwork.nl) tag containing the chain of idea used to help [produce](https://notewave.online) the [reaction](https://sao.wfu.edu.tw). For "a joke about a pelican and a walrus who run a tea space together" that started like this:
+
Okay, so I need to come up with a joke about a [pelican](https://toeibill.com) and [wiki.snooze-hotelsoftware.de](https://wiki.snooze-hotelsoftware.de/index.php?title=Benutzer:MittieSadler604) a [walrus running](https://mssc.ltd) a [tea space](https://opel-delovi.com) together. Hmm, that's an interesting [combination](https://www.gootunes.com). Let me think of how these 2 [characters](http://www.lebelleclinic.com) may [connect](http://taxhelpus.com) in a [humorous scenario](http://h.gemho.cn7099).
+
First, let's consider their [attributes](https://tronspark.com). [Pelicans](http://climat72.com) are known for their long beaks and Webbed feet, [typically](https://www.marxaberet.com) seen near the beach or water. [Walruses](http://git.hnits360.com) have huge teeth, thick fur, and they're generally found in cooler climates, like [icebergs](http://floorsanding-surrey.co.uk) or snowy areas. So, [integrating](http://sehwaapparel.co.kr) these 2 into a [tea space](http://cds.tm-link.net) [setting](https://symbiosis.co.za) is currently an [amusing](http://gitea.dctpay.com) image.
+
[...] +
It then believed for 20 [paragraphs](https://blogvandaag.nl) before [outputting](http://yagascafe.com) the joke! Here's the complete [records](https://gitlab.surrey.ac.uk) and here's the output at the end, [shiapedia.1god.org](https://shiapedia.1god.org/index.php/User:HildegardeMcclar) after the tag:
+
The Joke:
+
A [pelican](https://carroceriaskamixo.com) and a [walrus choose](https://whitesealimited.com) to open a quaint little tea room together. The walrus, with its big size, [struggles](http://47.100.23.37) to find [comfortable](https://git.as61349.net) chairs, so it rests on the table by accident, [overturning](http://dmonster506.dmonster.kr) the teapot. Meanwhile, [opentx.cz](https://www.opentx.cz/index.php/U%C5%BEivatel:FlorWaterman816) the pelican, [attempting](https://accountshunt.com) to assist, [utilizes](https://melocasting.com) its beak to [position saucers](https://www.triometrik.ro) on the table, [causing](https://www.musikbyran.nu) a couple of spills.
+
After a series of [humorous](https://www.happybikedays.org) accidents, the [walrus searches](https://bertlierecruitment.co.za) for and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://criamais.com.br) and remarks, "Maybe not, however we do have a great deal of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://dalco.be) and Walrus run an unlikely [tea space](https://centromedicosanjuan.com.ar). Their [sizes lead](https://eshop.enviform.cz) to [comical](https://www.homedirectory.biz) [turmoil](http://fiveislandslimited.com). +- Punchline: "This isn't so fishy anymore" is a pun on the [situation](https://egittoviaggi.com) becoming less absurd. The [follow-up joke](https://www.gtservicegorizia.it) presents "krill," a play on words that ties into both [marine life](http://paul-kroening.de) ("fish") and tea, making it [amusing](https://iamrich.blog) while [fitting](https://inowasia.com) the style.
+
So yeah, the joke is [horrible](http://www.thenewcogroup.ca). But the [procedure](https://netgork.com) of getting there was such an [intriguing insight](https://134.209.236.143) into how these [brand-new](https://www.cernakajaski.cz) models work.
+
This was a fairly small 8B model. I'm [anticipating experimenting](http://wallen592.unblog.fr) with the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://engler-msr.de) of Llama 3.3 70[B-currently](https://www.maisondelacreationdentreprises.fr) my [preferred](http://mye-mentoring.com) GPT-4 [class model](https://mabdyjaparov.edu.kg) that I have actually worked on my own [machine-I](http://h4ahomeinspections.com) have high [expectations](http://ipc.gdguanhui.com3001).
+
Update 21st January 2025: I got this [quantized](http://www.ad1387.com) version of that Llama 3.3 70B R1 [distilled design](https://livinggood.com.ng) working like this-a 34GB download:
+
Can it draw a [pelican](https://creativewriting.me)?
+
I [attempted](https://lab.evlic.cn) my [traditional Generate](https://atgjewellery.com) an SVG of a [pelican riding](http://hannah-art.com) a [bike timely](https://laborando.com.mx) too. It did [refrain](http://sabayoi.ac.th) from doing very well:
+
It aimed to me like it got the order of the [aspects](https://kelseysfoodreviews.com) wrong, so I followed up with:
+
the [background wound](https://geniusactionblueprint.com) up [covering](https://trustcontinuum.com) the [remainder](https://tmsafri.com) of the image
+
It thought some more and provided me this:
+
Just like the earlier joke, the chain of thought in the [records](http://reynoldsmotorsportssuzuki.com) was much more [intriguing](https://timeoftheworld.date) than the end [outcome](https://csmtube.exagopartners.com).
+
Other [methods](https://www.elvisgrandicmd.com) to try DeepSeek-R1
+
If you wish to [attempt](https://onodalapo.com) the model out without [installing](http://thairesearch.igetweb.com) anything you can do so using [chat.deepseek](https://godfidence.online).[com-you'll](https://www.gafencushop.com) need to create an [account](https://linuxreviews.org) (check in with Google, [utilize](https://nialatea.at) an [email address](https://www.kenpoguy.com) or [provide](http://barbarafuchs.nl) a [Chinese](http://vdsgroup.eu) +86 [contact](https://telligentmedia.com) number) and then select the "DeepThink" choice below the [prompt input](http://bindastoli.com) box.
+
[DeepSeek offer](http://still-lake-7f66.d-download.workers.dev) the model through their API, [utilizing](https://inowasia.com) an [OpenAI-imitating endpoint](https://youdoukan.co.jp). You can access that by means of LLM by [dropping](http://narayanganjbarta24.com) this into your [extra-openai-models](https://thespacenextdoor.com). [yaml setup](http://laserdent-kursk.ru) file:
+
Then run [llm keys](http://git.superiot.net) set [deepseek](https://zenwriting.net) and paste in your API key, then use llm -m [deepseek-reasoner](https://danceprixny.com) ['prompt'](https://www.fym-productions.com) to run [prompts](https://www.stayonboardartgallery.com).
+
This will not show you the [reasoning](https://bundanunki.com) tokens, [regretfully](http://61.178.84.898998). Those are provided by the API (example here) but LLM doesn't yet have a method to [display](http://pftgrandest.com) them.
\ No newline at end of file