I sometimes download data on to my computer for fun/possible entrepreneurial opportunity (sometimes data is withdrawn later so it's best to get personal copy).
AOL search data they released a few years ago (248mb)
music brainz (4.2gb)
Census data
US Business Census data probably from Bureau of Labor Statistics
establishments, employees by NAICS classification (3.3mb)
some other BLS (531mb)
some US government employee stats (2.7mb)
Canada census data (321mb)
UK census partly (202kb)
household expenditures (96kb)
retail survery (489kb)
susb naics (2mb)
UN world occupation data (49mb)
Time Magazine covers (253mb)
USDA nutrients (54mb)
ONET Skills from US Labor Bureau - data about job types and their duties (31mb)
guardian data sets csv (416k)
harvard library metadata (2.3gb)
Computer vision data
CIFAR 100 (200mb)
imagenet urls (338mb)
Faces dataset (from somewhere) (553mb)
mirflickr 1 million images (48gb)
mirflickr 25 (3gb)
poselets (1gb)
trecvid (4.8mb)
berkeley 3d kinect (800mb)
caltech256 (1.7gb)
microsoft kinect gestures (541mb)
pascal vision challenge (2.5gb)
Corp Watch (72kb)
various kinds of finance data (1gb)
daily summaries from NASDAQ, AMEX, NYSE
prices pulled from yahoo api
kenneth french research
robert schiller data
sec data
DBpedia (4.4gb)
dmoz (400mb)
Freebase (38gb)
Movie/tv data sets
IMDB (1.2gb)
netflix prize (700mb)
tvtropes (1gb)
various kaggle data sets (8.7gb)
sherlock holmes stories (14mb)
from google
wikilinks (1.8gb)
wikipedia crosslinks (8.5gb)
wikipedia page counts (112gb)
wikipedia (10gb)
No comments:
Post a Comment