this post was submitted on 20 Jan 2024
203 points (92.1% liked)
People Twitter
7576 readers
270 users here now
People tweeting stuff. We allow tweets from anyone.
RULES:
- Mark NSFW content.
- No doxxing people.
- Must be a pic of the tweet or similar. No direct links to the tweet.
- No bullying or international politcs
- Be excellent to each other.
- Provide an archived link to the tweet (or similar) being shown if it's a major figure or a politician.
founded 2 years ago
MODERATORS
I’m only vaguely familiar with ML datasets and have only trained on local data, but I’ve never heard of this? Can anyone provide some evidence this is the case?
Edit: Looking further I can still only find datasets containing the image files, ex.
https://www.lvisdataset.org/dataset
https://www.v7labs.com/open-datasets
LAION is one of the big dogs (https://laion.ai/). Their datasets consist of urls and metadata.
Ah perfect, thank you so much!
https://github.com/rom1504/img2dataset
Seems to be the main tool, I’ll have something new to explore this weekend.