M4 Mac Mini CLUSTER

M4 Mac Mini CLUSTER

Alex Ziskind

5 дней назад

306,560 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@MarkWatsonY
@MarkWatsonY - 27.11.2024 21:14

What if you got a bunch of thunderbolt to 10gbps adapters and plugged them all into a big switch. Then use link aggregation. If you set it up right it could get up to 40Gbps between any two machines if you had Mac minis with 10Gbps ports + three thunderbolt. Probably super expensive though and there might be thunderbolt to network adapters that can utilize more of the bandwidth than 10Gbps.

Ответить
@JeremyAndersonBoise
@JeremyAndersonBoise - 27.11.2024 21:29

The thumbnail got me instantly, I used to run a grip of Intel Mac Minis at home.

Ответить
@davidguerrero6375
@davidguerrero6375 - 27.11.2024 21:32

This kind of investigation is great and truly valuable. It’s a significant contribution to the community. This could become a rabbit hole once you start running tests. Thanks for sharing!

Ответить
@klaymoon1
@klaymoon1 - 27.11.2024 23:13

Amazing test. 4 x 64GB Mac Mini M4 Pro. Would that run Llama 405B?

Ответить
@gregf1299
@gregf1299 - 27.11.2024 23:26

Great video Alex! You mention setting up for Jumbo MTU sizes...did that make any difference? In theory you could get 9K packets...but I have my doubts.

Ответить
@nusch_pl
@nusch_pl - 28.11.2024 00:05

Wasn't Thunderbolt daisy chaining designed to avoid issues with the hub you having ? It should probably handle everything in hardware so shouldn't introduce super delays even if traffic is passing through 3 computers on its way

Ответить
@TheNameIsForty
@TheNameIsForty - 28.11.2024 00:06

This like button is for those that agree his play button is 100% level relative to flat earthers theory.

Ответить
@Kaula_ai
@Kaula_ai - 28.11.2024 00:24

can apple just make propriety apple silicon GPU's for the Mac Pro already

Ответить
@LanceBryantGrigg
@LanceBryantGrigg - 28.11.2024 00:36

Thank you for this video Alex, it's what I needed to get it. Coles notes for other folks.

If you want the top perf on the highest param models, you need to use CUDA capable cards, aka NVIDIA.

If you want to save energy and you have some mac mini's hanging around they do an OK job and will be most cost effective.

Cheers.!

Ответить
@max0x7ba
@max0x7ba - 28.11.2024 00:41

Fun fact, WiFi ethernet packets are up to 64kB, by default, with my 2020 wifi 6 router. When streaming videos with DLNA protocol it uses maximum length WiFi ethernet packets.

Ответить
@chexi
@chexi - 28.11.2024 00:44

неплохой английский. акцент практически отсутствует. вы либо переехали в подростковом возрасте либо очень усердно работали

Ответить
@peterlarsen4809
@peterlarsen4809 - 28.11.2024 01:26

Turn the bottom machine upside down to make sure it can breath.

Ответить
@peterlarsen4809
@peterlarsen4809 - 28.11.2024 01:33

I would like to see the comparison of the system over 10gb lan and thunderbolt

Ответить
@spiceyfrenchtoast9421
@spiceyfrenchtoast9421 - 28.11.2024 01:50

is no one going to meantion on how his play button is super crooked?

Ответить
@guillermogarciamanjarrez8934
@guillermogarciamanjarrez8934 - 28.11.2024 01:53

I would have liked to see it running a model that can´t fit in any single computer, like a 180B parameters model, it may be super slow but it would be a use case for all that ram

Ответить
@Mr.Smalleys
@Mr.Smalleys - 28.11.2024 02:26

I thought i'd try and test TPS with an i9-13900k and RTX3060ti 12gb vram 128gb ddr5 4000mt/s(slow ram, bleep quad channel)

root@226408baa9c8:~# ollama run llama3.2-vision "tell me a story about M4 Mac Mini cluster and RTX 3060ti comparison, do we beat 280 tps" --verbose
Here's a story for you:

In the heart of Silicon Valley, there was a team of developers at NovaTech, a cutting-edge AI research firm. They were on a mission to break through the boundaries of machine learning and deep learning performance. The team leader, Alex, had assembled an elite
squad of experts: John, a seasoned software engineer; Rachel, a brilliant data scientist; and Mark, a wizard with hardware.

Their challenge was to optimize a specific workload that required processing vast amounts of data in parallel. They were determined to beat the current benchmark of 280 transactions per second (tps) on a standard dataset.

The team decided to compare two systems: an M4 Mac Mini cluster and a single high-end PC equipped with an NVIDIA RTX 3060 Ti GPU.

*M4 Mac Mini Cluster*

Alex's team set up four M4 Mac Minis, each equipped with:

* Intel Core i9 processor (up to 3.6 GHz)
* 32 GB of DDR4 RAM
* Four 1 TB SSDs in RAID 10 configuration

The team installed a custom-built cluster management system, allowing them to distribute the workload evenly across all four nodes.

*RTX 3060 Ti PC*

John, being the hardware aficionado, opted for a high-end PC:

* Intel Core i9 processor (up to 3.6 GHz)
* 64 GB of DDR4 RAM
* NVIDIA GeForce RTX 3060 Ti GPU with 8 GB GDDR6 memory

The team ran the benchmark on both systems, using the same dataset and workload configuration.

*Results*

After hours of intense computing, the results were in:

M4 Mac Mini Cluster: *310 tps*
RTX 3060 Ti PC: *285 tps*

Alex's team was thrilled – they had beaten the 280 tps benchmark by a significant margin! The M4 Mac Mini cluster performed 30% better than the single high-end PC.

*What made it possible?*

Several factors contributed to this success:

1. **Parallel processing**: By distributing the workload across four nodes, the team was able to take advantage of multi-core processors and significantly improve overall performance.
2. **Custom-built cluster management**: The custom-built system allowed for efficient distribution of tasks, reducing communication overhead and ensuring that all nodes were utilized optimally.
3. **Optimized software**: The team had carefully optimized the software to minimize bottlenecks and maximize throughput.

*Conclusion*

Alex's team at NovaTech had successfully pushed the boundaries of machine learning performance. By leveraging the power of parallel processing and custom-built cluster management, they achieved a remarkable 30% improvement over the single high-end PC equipped with
an RTX 3060 Ti GPU.

The M4 Mac Mini cluster stood as a testament to the power of distributed computing and the potential for innovation in AI research. As Alex's team celebrated their victory, they knew that this was only the beginning – there were still many challenges to overcome,
but they were ready to take on the next frontier!

total duration: 30.436800181s
load duration: 4.443920349s
prompt eval count: 36 token(s)
prompt eval duration: 188ms
prompt eval rate: 191.49 tokens/s
eval count: 618 token(s)
eval duration: 25.803s
eval rate: 23.95 tokens/s
root@226408baa9c8:~# ;) Just a comparison at the end of the day, try open-webui, piplines all in docker.

Ответить
@gazollajr
@gazollajr - 28.11.2024 03:14

Hi Alex, nice job! It's so cool to see those setups. I was wondering if it would be viable to use a LattePanda Sigma as a larger model server. Could that be a potential next project for you?

Ответить
@mattmedium2388
@mattmedium2388 - 28.11.2024 03:18

Mac mini m4 looks nice and fast but i'll wait for a new one next year without the power button on the bottom,

Ответить
@tommytse23
@tommytse23 - 28.11.2024 03:45

I litearlly self learn how to run ai on my mac becasue i want to make studying materials with it for myself. And now i saw the vidioes you made

Ответить
@DougWCosta
@DougWCosta - 28.11.2024 04:33

Great video! And I was thinking why my m2 air (8gb) stopped when try to run QwenCode 2.5 14b Lkkkkkkkkkk Just broke... Chatgpt still a good deal if you are no putting sensitive data.

Ответить
@max0x7ba
@max0x7ba - 28.11.2024 04:47

You need GPUs and RAM for training, that's what people buy them for. Inference can run in your wrist watch if it has enough RAM to handle the model.

Ответить
@CedroCron
@CedroCron - 28.11.2024 08:01

Great video, thank you!

Ответить
@JohnnoWaldmann
@JohnnoWaldmann - 28.11.2024 10:16

I want a cluster to run DaVinci Resolve on for colour grading. Too often Resolve chokes when it runs out of GPU, and CPU cores to allocate. Waiting for nodes and clips to render interrupts focus. Blackmagic do not facilitate cluster computing in Resolve. It surely someone can figure out how to throw dozens of lower cost Mac’s at Resolve.

Ответить
@ThomasPleasance
@ThomasPleasance - 28.11.2024 11:10

Could you daisy chain the mac's 1-2 2-3 3-4 4-5 for its networking or could you do a ring ?

Ответить
@battleforevermore
@battleforevermore - 28.11.2024 13:34

how about 8700g??

Ответить
@tristan_chx
@tristan_chx - 28.11.2024 13:43

Would it be possible to daisy chain Macs via thunderbolt? Wouldn’t this network topology allow more elements in the network and at greater speed?

Ответить
@nicolamodena8632
@nicolamodena8632 - 28.11.2024 14:05

Would you use a M4 Mini stack to mine crypto?

Ответить
@mohamedshuaau632
@mohamedshuaau632 - 28.11.2024 14:28

So i can only imagine how much sonnet is using for something like cursor AI

Ответить
@luthfibadri
@luthfibadri - 28.11.2024 15:25

tiktok ahh edit

Ответить
@RyanF470
@RyanF470 - 28.11.2024 15:50

You could have created a connection using thunderbolt in a Mesh,
a -> b, c
b -> c, d
c -> d, e
d -> e, a
e -> a, b

15 Thunderbolt cables, setup as a bridge and just give each device it's own IP address, You could setup. It does mean to get from a to d it has to go through b or c.

Ответить
@El.Duder-ino
@El.Duder-ino - 28.11.2024 17:38

Nice vid, there r some interesting conclusions, however I am missing direct comparison in the end with the Nvidia's cards like top 4090 u have showed in the video. I think making single video comparing AI performance per watt between M4 Mac Mini (cluster) vs MBP with M4 Max and 4090 would be much more informative and interesting.

Ответить
@naizondj
@naizondj - 28.11.2024 18:01

My god what a geek!😂😂😂 did not understand f*uck all! Ahhahahaha well i just get one base model with 512ssd that’s all😂😂😂😂

Ответить
@wizardenji
@wizardenji - 28.11.2024 19:41

Hi, Alex, thank you for your excellent sharing. I am wondering that is it possible to apply such clustering to bioinformatics, which need more CPU(threads) and ram, rather than GPU. If yes, how can I set up for two mac mini.

Ответить
@shApYT
@shApYT - 28.11.2024 20:40

You can either buy 1 M4 pro mini with 64gb ram and 8tb storage for $5000 or 9 M4 mini with a total combined 144gb ram and 2.3tb storage for $5400. Apple's bean counter computes maths.

Ответить
@nahawand
@nahawand - 28.11.2024 23:44

excuse me, I didn't get it, did you mention the same performance on a 4090 card? otherwise the comparison is pointless without knowing performance per watt

Ответить
@awkward--salad7565
@awkward--salad7565 - 28.11.2024 23:52

how did you get exo running!? i followed all the instructions on the git page. have a m3max mbp any tips would be great!

Ответить
@youcrew
@youcrew - 29.11.2024 12:18

Try the caldigit thunderbolt hub Only thunderbolt ports no display out, I have noticed a much better data rate on it.

Ответить
@avaviel
@avaviel - 29.11.2024 12:34

The audio transitions in your video are odd and jarring.

Like this isn't CSI ha

Ответить
@DemocracyDecoded
@DemocracyDecoded - 29.11.2024 13:25

This guy takes every chance to flash his 4090 lol

Ответить
@simon8988
@simon8988 - 29.11.2024 15:11

Do you have an airconditioner?

Ответить
@macauleyp1950
@macauleyp1950 - 29.11.2024 17:49

What is that stackable tower you’re using for 5 minis? Is it custom?

Ответить
@princeamori
@princeamori - 29.11.2024 17:56

This video, the explanation and the tests were so well done. First time on your channel, now I have subscribed. Thank you.

Ответить
@jakayboy
@jakayboy - 29.11.2024 19:40

Whats to stop the 4090 from using the 1tb of ddr5?

Ответить
@SomeTechGuy666
@SomeTechGuy666 - 29.11.2024 21:37

I do a lot of CFD work using OpenFOAM. It is insanely memory access intensive. Any chance you could run the OpenFOAM benchmark on your cluster ?

Ответить
@dnegrisolli
@dnegrisolli - 30.11.2024 04:21

This configuration for a LLM model will be suitable doing the merge of the machines using Kubernetes?

Ответить
@Hoopergames
@Hoopergames - 30.11.2024 07:15

I wish I had $2000 to spend to ask a computer what it's name is and to tell me a story

Ответить
@tusenkopparkaffe
@tusenkopparkaffe - 30.11.2024 09:16

This was interesting. Thanks 👍🏻

Ответить