47
top 2 comments
sorted by: hot top controversial new old
[-] howrar@lemmy.ca 2 points 6 days ago* (last edited 6 days ago)

I can understand objecting to it if it's used as training data, but it sounds like this is basically just "indexing" the contents of the book, similar to how a search engine works.

[-] markovs_gun@lemmy.world 7 points 6 days ago

The problem is that LLM outputs cannot be constrained to only factual information and only information about the book. For example, say a lot of reddit comments falsely or jokingly claim that the reason a certain plot point in a book was that the author was smoking crack, and not a lot else was written on the subject. The LLM may then be influenced by this in its training corpus to answer the question "Why did the author write this scene?" With "Because he was smoking crack!" And there's nothing really that anyone can do to prevent it 100% of the time.

this post was submitted on 12 Dec 2025
47 points (98.0% liked)

Hacker News

3263 readers
378 users here now

Posts from the RSS Feed of HackerNews.

The feed sometimes contains ads and posts that have been removed by the mod team at HN.

founded 1 year ago
MODERATORS