203
you are viewing a single comment's thread
view the rest of the comments
[-] Deceptichum@kbin.social 32 points 10 months ago* (last edited 10 months ago)

I’m only vaguely familiar with ML datasets and have only trained on local data, but I’ve never heard of this? Can anyone provide some evidence this is the case?

Edit: Looking further I can still only find datasets containing the image files, ex.

https://www.lvisdataset.org/dataset

https://www.v7labs.com/open-datasets

[-] Ymmelbackwards@lemmy.world 17 points 10 months ago

LAION is one of the big dogs (https://laion.ai/). Their datasets consist of urls and metadata.

[-] Deceptichum@kbin.social 3 points 10 months ago

Ah perfect, thank you so much!

https://github.com/rom1504/img2dataset

Seems to be the main tool, I’ll have something new to explore this weekend.

this post was submitted on 20 Jan 2024
203 points (92.1% liked)

People Twitter

5228 readers
829 users here now

People tweeting stuff. We allow tweets from anyone.

RULES:

  1. Mark NSFW content.
  2. No doxxing people.
  3. Must be a tweet or similar
  4. No bullying or international politcs
  5. Be excellent to each other.

founded 1 year ago
MODERATORS