‘In awe’: scientists impressed by latest ChatGPT model o1 (www.nature.com)

submitted 1 week ago by QuillcrestFalconer@hexbear.net to c/technology@hexbear.net

48 comments fedilink hide all child comments

I know people here are very skeptical of AI in general, and there is definitely a lot of hype, but I think the progress in the last decade has been incredible.

Here are some quotes

“In my field of quantum physics, it gives significantly more detailed and coherent responses” than did the company’s last model, GPT-4o, says Mario Krenn, leader of the Artificial Scientist Lab at the Max Planck Institute for the Science of Light in Erlangen, Germany.

Strikingly, o1 has become the first large language model to beat PhD-level scholars on the hardest series of questions — the ‘diamond’ set — in a test called the Graduate-Level Google-Proof Q&A Benchmark (GPQA)1. OpenAI says that its scholars scored just under 70% on GPQA Diamond, and o1 scored 78% overall, with a particularly high score of 93% in physics

OpenAI also tested o1 on a qualifying exam for the International Mathematics Olympiad. Its previous best model, GPT-4o, correctly solved only 13% of the problems, whereas o1 scored 83%.

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

Catherine Brownstein, a geneticist at Boston Children’s Hospital in Massachusetts, says the hospital is currently testing several AI systems, including o1-preview, for applications such as connecting the dots between patient characteristics and genes for rare diseases. She says o1 “is more accurate and gives options I didn’t think were possible from a chatbot”.

you are viewing a single comment's thread
view the rest of the comments

[-] FrogPrincess@lemmy.ml -1 points 1 week ago* (last edited 1 week ago)

we have consistently given other reasons. our criticisms of AI are salient and based both within the fundamental structure of the technology and the ways in which it’s employed.

Super. Link to the best critique of AI on here?

[-] RedWizard@hexbear.net 11 points 1 week ago* (last edited 1 week ago)

Its energy consumption is absolutely unacceptable, it puts the Crypto market to utter shame regarding its ecological impact. I mean, Three Mile Island Site 1 is being recommissioned to service Microsoft Datacenters instead of the 800,000 homes it could service with its 835 megawatt output. This is being made possible thanks to taxpayer backed loans provided by the federal government. So American's tax dollars are being funneled into a private energy company, to provide a private tech company 835 megawatts of power output, for a service they are attempting to make a profit from. Instead of being provided clean, reliable energy to their households.

Power consumption is only one half of the ecological impact that AI brings to the table, too. The cooling requirement of AI text generation has been found to consume just over 1 bottle of water (519 milliliters) per 100 words, or the equivalent of a brief email. In areas where electricity costs are high, they consume an insane amount of water from the local supply. In one case, The Dalles, Google's datacenters were using nearly a quarter of all the water available in the town. Some of these datacenters use cooling towers where external air travels across a wet media so the water evaporates. Which means that they do not recycle the water being used to cool, and it is consumed and removed from whatever water supply they are drawing from.

These datacenters consume resources, but often do not bring economic advantages to the people living in the areas they are constructed. Instead, those people are subject to the sounds of their cooling systems (if being electrically cooled), a hit to their property value, strain on their local electric grid, and often are a massive consumer of local water (if being liquid cooled).

Models need to be trained and that training happens in datacenters, which can at times take months to complete. The training is an expense the company pays just to get these systems off the ground. So before any productive benefits can be gained by these AI systems, you have to consume a massive number of resources just to train the models. Microsoft’s data center used 700,000 liters of water while training GPT-3 according to the Washington Post. Meta used 22 million liters of water training its LLaMA-3 open source AI model.

And for what exactly? As others have pointed out in this thread, and others outside this community broadly, these models only wildly succeed when placed into a bounded test scenario. As commenters on this NYT article point out:

Major problem with this article: competition math problems use a standardized collection of solution techniques, it is known in advance that a solution exists, and that the solution can be obtained by a prepared competitor within a few hours.

“Applying known solutions to problems of bounded complexity” is exactly what machines always do and doesn’t compete with the frontier in any discipline.

Note in the caption of the figure that the problem had to be translated into a formalized statement in AlphaGeometry's own language (presumably by people). This is often the hardest part of solving one of these problems.

These systems are only capable of performing within the bounds of existing content. They are incapable of producing anything new or unexplored. When one data scientist looked at the o1 model, he had this to say about the speed at which the o1 model constructed code that took him months to complete:

Kyle Kabasares, a data scientist at the Bay Area Environmental Research Institute in Moffett Field, California, used o1 to replicate some coding from his PhD project that calculated the mass of black holes. “I was just in awe,” he says, noting that it took o1 about an hour to accomplish what took him many months.

He makes these remarks, with almost no self-awareness. The likelihood that this model was trained on his very own research is very high, and so naturally the system was able to provide him a solution. The data scientist labored for months creating a solution that, to be assumed, wasn't a reality beforehand, and the o1 model simply internalized his solution. When asked to provide that solution, it did so. This isn't an astonishing accomplishment, it's a complicated, expensive, and damaging search engine that will hallucinate an answer when you've asked it to produce something that sits outside the bounds of its training.

The vast majority of use cases for these systems by the public are not cutting-edge research. It's writing the next 100 word email you don't want to write, and sacrificing a bottle of water every time they do it. It's replacing jobs being held by working people and replacing them with a system that is often exploitable, costly, and inefficient at the task of performing the job. These systems are a parlor trick at best, and a demon whose hunger for electric and water is insatiable at worst.

[-] UlyssesT@hexbear.net 8 points 1 week ago

You didn't get a reply to your effortpost because the treat printer proselytizer already assumes very smartness and correctness by default and any challenge to that gets no reply.

I hate that shit so much. It's a plague across the techbro world.

[-] FrogPrincess@lemmy.ml 0 points 1 week ago

You didn’t get a reply to your effortpost because

You said this less than 15 minutes after the good comment.

load more comments (3 replies)

load more comments (11 replies)

load more comments (12 replies)

this post was submitted on 04 Oct 2024

19 points (85.2% liked)

technology

23233 readers

350 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

Jadzia_Dax@hexbear.net

blashork@hexbear.net

context@hexbear.net

EmmaGoldman@hexbear.net

SexUnderSocialism@hexbear.net

gaycomputeruser@hexbear.net

ZoomeristLeninist@hexbear.net