https://mastodon.social/@tess/111784134039294048
I’m only vaguely familiar with ML datasets and have only trained on local data, but I’ve never heard of this? Can anyone provide some evidence this is the case?
Edit: Looking further I can still only find datasets containing the image files, ex.
https://www.lvisdataset.org/dataset
https://www.v7labs.com/open-datasets
LAION is one of the big dogs (https://laion.ai/). Their datasets consist of urls and metadata.
Ah perfect, thank you so much!
https://github.com/rom1504/img2dataset
Seems to be the main tool, I’ll have something new to explore this weekend.
People tweeting stuff. We allow tweets from anyone.
RULES:
I’m only vaguely familiar with ML datasets and have only trained on local data, but I’ve never heard of this? Can anyone provide some evidence this is the case?
Edit: Looking further I can still only find datasets containing the image files, ex.
https://www.lvisdataset.org/dataset
https://www.v7labs.com/open-datasets
LAION is one of the big dogs (https://laion.ai/). Their datasets consist of urls and metadata.
Ah perfect, thank you so much!
https://github.com/rom1504/img2dataset
Seems to be the main tool, I’ll have something new to explore this weekend.