162

cross-posted from: https://lemdro.id/post/11955

TL;DR

  • Google has updated its privacy policy.
  • The new policy adds that Google can use publically available data to train its AI products.
  • The way the policy is worded, it sounds as if the company is reserving the right to harvest and use data posted anywhere on the web.

You probably didn’t notice, but Google quietly updated its privacy policy over the weekend. While the wording of the policy is only slightly different from before, the change is enough to be concerning.

As discovered by Gizmodo, Google has updated its privacy policy. While there’s nothing particularly notable in most of the policy, one section now sticks out — the research and development section. That section explains how Google can use your information and now reads as:

Google uses information to improve our services and to develop new products, features and technologies that benefit our users and the public. For example, we use publicly available information to help train Google’s AI models and build products and features like Google Translate, Bard, and Cloud AI capabilities.

Before the update, this section mentioned “for language models” instead of “AI models.” It also only mentioned Google Translate, where it now adds Bard and Cloud AI.

As the outlet points out, this is a peculiar clause for a company to add. The reason why it’s peculiar is that the way it’s worded makes it sound as if the tech giant reserves the right to harvest and use data from any part of the public internet. Usually, a policy such as this only discusses how the company will use data posted on its own services.

While most people likely realize that whatever they put online will be publicly available, this development opens up a new twist — use. It’s not just about others being able to see what you write online, but also about how that data will be used.

Bard, ChatGPT, Bing Chat, and other AI models that provide real-time information work by scraping information from the internet. The sourced information can often come from others’ intellectual property. Right now, there are lawsuits accusing these AI tools of theft, and there are likely to be more to come down the line.

top 10 comments
sorted by: hot top controversial new old
[-] agitatedpotato@lemmy.world 39 points 1 year ago

Theres no situation in which I can envision AI scraping the open internet to be a good way to train them. Stop doing things cheaply and curate it yourself or you're gonna get what you paid for, which in this case is mostly free trash content.

[-] FlyingSquid@lemmy.world 22 points 1 year ago

What's going to be fun is when they start scraping their own output and it becomes a recursive nightmare.

[-] agitatedpotato@lemmy.world 8 points 1 year ago* (last edited 1 year ago)

As if the internet isn't full of regurgitated lies already, can't wait for the AI to reinforce them into itself.

[-] Zarxrax@lemmy.world 11 points 1 year ago

They need massive amounts of data. There is simply no way to manually curate data on that scale, short of hiring like a million people. It's very likely that they do use some sort of automated filtering to curate the data though.

[-] aspensmonster@lemmygrad.ml 2 points 1 year ago

They need massive amounts of data. There is simply no way to manually curate data on that scale, short of hiring like a million people. It’s very likely that they do use some sort of automated filtering to curate the data though.

If we can throw tens of millions of soldiers into meat grinders for wars, then I think hiring a few million people to curate data is table stakes by comparison.

[-] dystop@lemmy.world 29 points 1 year ago

To be fair everything on the web is already being used to train AI.

[-] Chainweasel@lemmy.world 13 points 1 year ago

Sounds like a good way to speed up sample collapse. Eventually AI training data will include enough AI written comments and articles that the training data or "sample" will be contaminated by it resulting in training data that makes the AI worse, kind of like inbreeding in humans expressing recessive genetic disorders like diseases and birth defects. Once that happens they'll need to find a way to accurately detect and remove AI data from the sample to continue improving the AI. But even with the early generations of AI we're using now it's incredibly hard to automate that with any kind of real accuracy and attempts have caught a lot of false positives and let a lot of legitimate AI generated texts through.
It's interesting that the limiting factor on AI development may be the fact that it was released to the public before further training could be done.

[-] Greenskye@lemmy.world 4 points 1 year ago

A big hurdle of AI is the fact that they really can't 'learn', at least not like humans can, where we filter out bad data or go back and correct previous assumptions (not that we do this perfectly). Seems like anyone who's able to truly figure out how to teach AI without needing super-clean data sets will have basically unlocked something pretty close to the singularity. Which makes me assume that we're honestly no where close to figuring that out and that sample collapse is much more likely (with possibly the internet as a whole being effectively ruined, same as voice calls have been effectively ruined by rampant spam).

[-] grte@lemmy.ca 5 points 1 year ago

As AI generated content proliferates the internet won't this lead to, I don't know I'm not an expert on this, a worse and worse environment to actually learn from human input? Like, more and more spoiled data inputs should lead to some weird results, wouldn't you think?

this post was submitted on 05 Jul 2023
162 points (98.8% liked)

Android

27940 readers
6 users here now

DROID DOES

Welcome to the droidymcdroidface-iest, Lemmyest (Lemmiest), test, bestest, phoniest, pluckiest, snarkiest, and spiciest Android community on Lemmy (Do not respond)! Here you can participate in amazing discussions and events relating to all things Android.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules


1. All posts must be relevant to Android devices/operating system.


2. Posts cannot be illegal or NSFW material.


3. No spam, self promotion, or upvote farming. Sources engaging in these behavior will be added to the Blacklist.


4. Non-whitelisted bots will be banned.


5. Engage respectfully: Harassment, flamebaiting, bad faith engagement, or agenda posting will result in your posts being removed. Excessive violations will result in temporary or permanent ban, depending on severity.


6. Memes are not allowed to be posts, but are allowed in the comments.


7. Posts from clickbait sources are heavily discouraged. Please de-clickbait titles if it needs to be submitted.


8. Submission statements of any length composed of your own thoughts inside the post text field are mandatory for any microblog posts, and are optional but recommended for article/image/video posts.


Community Resources:


We are Android girls*,

In our Lemmy.world.

The back is plastic,

It's fantastic.

*Well, not just girls: people of all gender identities are welcomed here.


Our Partner Communities:

!android@lemmy.ml


founded 1 year ago
MODERATORS