A German non-profit that has scraped the internet and made several large scale databases from it for access to use in training machine learning models. In one example database, they utilized tags on images and URLs to create a database that matches the text tags to the image on a webpage.