Huawei enters the GPU market with 96 GB VRAM GPU under 2000 USD, meanwhile NVIDIA sells from 10,000+ (RTX 6000 PRO) (www.alibaba.com)

submitted 1 week ago by yogthos@lemmy.ml to c/technology@lemmy.ml

56 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] yogthos@lemmy.ml 16 points 1 week ago

When you definitely know the difference between what a CPU and a GPU does.

[-] Aria@lemmygrad.ml 1 points 5 days ago* (last edited 5 days ago)

You can run llama.cpp on CPU. LLM inference doesn't need any features only GPUs typically have, that's why it's possible to make even simpler NPUs that can still run the same models. GPUs just tend to be faster. If the GPU in question is not faster than an equally priced CPU, you should use the CPU (better OS support).

Edit: I looked at a bunch real-world prices and benchmarks, and read the manual from Huawei and my new conclusion is that this is the best product on the market if you want to run a model at modest speed that doesn't fit in 32GB but does in 96GB. Running multiple in parallel seems to range from unsupported to working poorly, so you should only expect to use one.

Original rest of the comment, made with the assumption that this was slower than it is, but had better drivers:
~~The only benefit to this product over CPU is that you can slot multiple of them and they parallelise without needing to coordinate anything with the OS. It's also a very linear cost increase as long as you have the PCIe lanes for it. For a home user with enough money for one or two of these, they would be much better served spending the money on a fast CPU and 256GB system RAM.~~

~~If not AI, then what use case do you think this serves better?~~

[-] yogthos@lemmy.ml 1 points 5 days ago

The point is that the GPU is designed for parallel computation. This happens to be useful for graphics, AI, and any other problem that can be expressed as a lot of independent calculations that can be executed in parallel. It's a completely different architecture from a traditional CPU. This particular card is meant for running LLM models, and it will do it orders of magnitude faster than running this stuff on a CPU.

[-] Aria@lemmygrad.ml 1 points 5 days ago

300i https://www.bilibili.com/video/BV15NKJzVEuU/
M4 https://github.com/itsmostafa/inference-speed-tests
It's comparable to an M4, maybe a single order of magnitude faster than a ~1000 euro 9960X, at most, not multiple. And if we're considering the option of buying used, since this is a brand new product and less available in western markets, the CPU-only option with an EPYC and more RAM will probably be a better local LLM computer for the cost of 2 of these and a basic computer.

[-] yogthos@lemmy.ml 1 points 5 days ago

M4 is a SoC architecture so it's not directly comparable. It combines multiple chips for CPU and GPU that share memory on a single chip.

[-] interdimensionalmeme@lemmy.ml 1 points 1 week ago

For 2000$ it "claims" to do 140 TOPS of INT8
When a Intel Core Ultra 7 265K does 33 TOPS of INT8 for 284$

Don't get me wrong, I would LOVE to buy a chinese GPU at a reasonnable price but this isn't even price competitive with CPUs let alone GPUs.

[-] yogthos@lemmy.ml 11 points 1 week ago

Again, completely different purposes here.

[-] LeLachs@lemmy.ml 2 points 1 week ago* (last edited 1 week ago)

Alright, lets compare it to another GPU.

According to this source , the RTX 4070 costs about 500$ and does 466 TOPS of INT8

I dont know if TOPS is a good measurement tho (I dont have any experience with AI benchmarking)

[-] yogthos@lemmy.ml 8 points 6 days ago

Now go look at the amount of VRAM it has.

this post was submitted on 30 Aug 2025

87 points (97.8% liked)

Technology

39627 readers

71 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 6 years ago

MODERATORS

MinutePhrase@lemmy.ml