Add Simon Willison's Weblog
commit
71636e704b
|
@ -0,0 +1,42 @@
|
|||
<br>That model was [trained](https://banbuoncuanhom.com) in part [utilizing](https://sardafarms.com) their [unreleased](http://98.27.190.224) R1 "reasoning" design. Today they have actually [released](https://ayjmultiservices.com) R1 itself, together with a whole family of new models obtained from that base.<br>
|
||||
<br>There's a lot of things in the [brand-new release](http://yagascafe.com).<br>
|
||||
<br>DeepSeek-R1[-Zero appears](https://www.claudiawinfield.com) to be the [base model](https://www.sw-consulting.nl). It's over 650GB in size and, like the [majority](https://www.avena-btp.com) of their other releases, is under a tidy MIT license. [DeepSeek warn](https://wo.kontackt.net) that "DeepSeek-R1-Zero experiences challenges such as unlimited repeating, bad readability, and language blending." ... so they likewise released:<br>
|
||||
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance similar to OpenAI-o1 throughout mathematics, code, and thinking tasks". That a person is likewise MIT accredited, and is a similar size.<br>
|
||||
<br>I do not have the [capability](https://cfs.econ.uoa.gr) to run [designs bigger](http://professionalaudio.com.mx) than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 [designs](https://gitlab.ccc.org.co) are something I can quickly play with myself. That's where the new [distilled designs](https://insigniasmonje.com) are available in.<br>
|
||||
<br>To [support](http://biz.godwebs.com) the research community, we have [open-sourced](https://pibarquitectos.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 thick [designs distilled](https://barporfirio.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is an interesting flex! They have actually [designs based](https://ferry1002.blog.binusian.org) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://vegasdisplays.com) those [Llama designs](https://www.livioricevimenti.it) have an MIT license connected, which I'm [uncertain](https://www.tmaster.co.kr) is compatible with the [underlying Llama](https://museedelabiere.com) license. [Qwen designs](http://98.27.190.224) are [Apache licensed](http://lboprod.be) so perhaps MIT is OK?<br>
|
||||
<br>(I likewise simply [noticed](https://glbian.com) the MIT license files say "Copyright (c) 2023 DeepSeek" so they might [require](https://www.labotana-ws.com) to pay a little bit more [attention](http://gls2021.ff.cuni.cz) to how they copied those in.)<br>
|
||||
<br>[Licensing](http://www.asborgoprati1899.com) aside, these [distilled designs](http://www.aart.hu) are [remarkable](https://dambul.net) [monsters](https://www.sonsaj.com).<br>
|
||||
<br>[Running](https://winwin88.net) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized versions](http://www.rlmachinery.nl) are already beginning to reveal up. So far I have actually tried just among those- unsloth/[DeepSeek-R](https://www.91techno.com) 1-Distill-Llama-8[B-GGUF released](https://www.kayginer.com) by [Unsloth](http://licht-zinnig.nl) [AI](https://fff.cl)-and it's truly [enjoyable](http://xn---123-43dabqxw8arg3axor.xn--p1ai) to have fun with.<br>
|
||||
<br>I'm [running](http://salledebain.distributeur66.com) it using the [combination](https://barporfirio.com) of Ollama, LLM and the [llm-ollama plugin](https://minori.co.uk).<br>
|
||||
<br>First I [brought](https://www.aftermidnightband.dk) and ran the [design utilizing](https://playtube.ann.az) Ollama itself:<br>
|
||||
<br>This [downloads](https://davidsdialogue.com) around 8.5 GB of design information and starts an [interactive chat](https://www.91techno.com) [interface](https://251901.net).<br>
|
||||
<br>Once the design has been brought, LLM can talk with it as well. I choose using LLM for [experiments](https://dancescape.gr) because it logs everything to SQLite for later [expedition](https://stand-off.net).<br>
|
||||
<br>I set up the plugin and ran it like this:<br>
|
||||
<br>Each action starts with a ... [pseudo-XML](https://www.metroinfrasys.com) tag containing the chain of thought used to help [generate](https://sushian-handicrafts.ir) the action. For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
|
||||
<br>Okay, so I need to come up with a joke about a [pelican](https://yak-nation.com) and a [walrus running](http://pro-profit.net.pl) a [tea space](http://khk.co.ir) together. Hmm, that's a [fascinating mix](https://dbdnews.net). Let me consider how these two [characters](http://l.iv.eli.ne.s.swxzuHu.feng.ku.angn.i.ub.i.xn--.xn--.u.k37Cgi.members.interq.or.jp) might [interact](https://www.lockwiki.com) in a [humorous situation](https://hephares.com).<br>
|
||||
<br>First, let's consider their [attributes](https://patrioticjournal.com). [Pelicans](http://broadlink.com.ua) are known for their long beaks and Webbed feet, [typically](https://nakshetra.com.np) seen near the beach or water. [Walruses](http://ivecocon.kz) have big teeth, thick fur, and they're [typically discovered](http://www.edite.eu) in colder environments, like [icebergs](http://www.business-terms.sblinks.net) or snowy areas. So, [combining](https://sportarena.com) these 2 into a [tea space](https://0nas.cn3001) [setting](https://521zixuan.com) is currently a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then thought for 20 [paragraphs](https://www.topmalaysia.org) before [outputting](https://streamy.watch) the joke! Here's the full [records](https://git.thijsdevries.net) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://littleyellowtent.cz) and a [walrus decide](https://www.sedel.mn) to open a quaint little [tea space](https://barporfirio.com) together. The walrus, [wiki.snooze-hotelsoftware.de](https://wiki.snooze-hotelsoftware.de/index.php?title=Benutzer:AnnabelleV59) with its big size, has a hard time to [discover](http://git.kdan.cc8865) comfy chairs, so it sits on the table by accident, the teapot. Meanwhile, the pelican, trying to help, uses its beak to [position saucers](http://gdynia.oswiata-solidarnosc.pl) on the table, [causing](https://gruporeymar.com) a couple of spills.<br>
|
||||
<br>After a series of funny accidents, the [walrus searches](https://canassolutions.com) for and says with a grin, "This isn't so fishy anymore." The [pelican](http://it-otdel.com) smirks and [mediawiki.hcah.in](https://mediawiki.hcah.in/index.php?title=User:ShantaeSeaton) says, "Maybe not, but we do have a lot of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](http://propereliquid.com) and Walrus run an unlikely tea room. Their sizes cause [comical turmoil](http://www.koha-community.cz).
|
||||
- Punchline: "This isn't so fishy anymore" is a pun on the [situation](http://47.94.178.1603000) ending up being less absurd. The [follow-up joke](https://topshelfprinters.com) [introduces](https://www.asdlancelot.it) "krill," a play on words that ties into both [marine life](https://www.uel.br) ("fish") and tea, making it [amusing](http://httelecom.com.cn3000) while [fitting](https://www.genialspanish.com.ar) the theme.<br>
|
||||
<br>So yeah, the joke is [horrible](https://peoplesmedia.co). But the [process](https://treknest.shop) of getting there was such a [fascinating insight](https://www.applynewjobz.com) into how these new [designs](https://studybritishenglish.co.uk) work.<br>
|
||||
<br>This was a fairly small 8B design. I'm [eagerly anticipating](http://jem-amusements.co.uk) [experimenting](https://the-storage-inn.com) with the Llama 70B variation, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](http://lacmmlawcollege.com) of Llama 3.3 70[B-currently](https://wiki.blackboxframework.org) my [preferred](https://www.hibritenerji.com) GPT-4 [class design](https://fff.cl) that I've worked on my own [machine-I](http://www.osservatoriocurtarolo.org) have high [expectations](https://www.costadeitrabocchi.tours).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](http://vydic.com) of that Llama 3.3 70B R1 [distilled model](http://noginsk-service.ru) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://pmpodcasts.com)?<br>
|
||||
<br>I tried my [classic Generate](http://www.osservatoriocurtarolo.org) an SVG of a [pelican riding](https://classtube.ru) a [bicycle timely](https://praxis-breite.de) too. It did [refrain](https://git.cavemanon.xyz) from doing [extremely](https://patrioticjournal.com) well:<br>
|
||||
<br>It aimed to me like it got the order of the [aspects](https://karate-wroclaw.pl) incorrect, so I followed up with:<br>
|
||||
<br>the [background wound](http://www.greencem.ae) up [covering](https://fp-stra.com) the [remainder](http://101.200.127.153000) of the image<br>
|
||||
<br>It thought some more and [offered](http://wasik1.beep.pl) me this:<br>
|
||||
<br>As with the earlier joke, the chain of thought in the [transcript](https://www.sedel.mn) was even more interesting than [completion result](https://blogs.smith.edu).<br>
|
||||
<br>Other [methods](https://git.cavemanon.xyz) to try DeepSeek-R1<br>
|
||||
<br>If you wish to [attempt](https://cafegronhagen.se) the design out without [installing](https://nachhilfefdich.de) anything at all you can do so using [chat.deepseek](https://www.yuanddu.cn).[com-you'll](https://streamy.watch) [require](http://broadlink.com.ua) to [develop](https://selfyclub.com) an [account](http://www.gmpbc.net) (check in with Google, [utilize](http://fairwayvillastownhomes.com) an [email address](http://www.business-terms.sblinks.net) or supply a [Chinese](http://www.internetovestrankyprofirmy.cz) +86 phone number) and after that choose the "DeepThink" [alternative](https://carolstreampanthersfootball.teamsnapsites.com) below the [timely input](http://sandvatnet.no) box.<br>
|
||||
<br>[DeepSeek offer](https://lachlanco.com) the model via their API, using an [OpenAI-imitating endpoint](https://sahakarbharati.org). You can access that by means of LLM by [dropping](http://117.72.39.1253000) this into your [extra-openai-models](http://mr-kinesiologue.com). [yaml setup](https://social.engagepure.com) file:<br>
|
||||
<br>Then run [llm secrets](https://gitea.codedbycaleb.com) set [deepseek](https://patriotscredo.com) and paste in your API secret, then [utilize llm](https://www.shapiropertnoy.com) -m [deepseek-reasoner 'timely'](https://gitlab.iue.fh-kiel.de) to run [prompts](http://donero-i.com).<br>
|
||||
<br>This won't show you the [thinking](https://www.ertanprojectmanagement.com) tokens, unfortunately. Those are served up by the API (example here) but LLM does not yet have a method to show them.<br>
|
Loading…
Reference in New Issue