203
you are viewing a single comment's thread
view the rest of the comments
[-] Deceptichum@kbin.social 32 points 1 year ago* (last edited 1 year ago)

I’m only vaguely familiar with ML datasets and have only trained on local data, but I’ve never heard of this? Can anyone provide some evidence this is the case?

Edit: Looking further I can still only find datasets containing the image files, ex.

https://www.lvisdataset.org/dataset

https://www.v7labs.com/open-datasets

[-] Ymmelbackwards@lemmy.world 17 points 1 year ago

LAION is one of the big dogs (https://laion.ai/). Their datasets consist of urls and metadata.

[-] Deceptichum@kbin.social 3 points 1 year ago

Ah perfect, thank you so much!

https://github.com/rom1504/img2dataset

Seems to be the main tool, I’ll have something new to explore this weekend.

this post was submitted on 20 Jan 2024
203 points (92.1% liked)

People Twitter

7576 readers
270 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

  1. Mark NSFW content.
  2. No doxxing people.
  3. Must be a pic of the tweet or similar. No direct links to the tweet.
  4. No bullying or international politcs
  5. Be excellent to each other.
  6. Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician.

founded 2 years ago
MODERATORS