Thursday 14 November 2013

Data sets on my hard drive

I sometimes download data on to my computer for fun/possible entrepreneurial opportunity (sometimes data is withdrawn later so it's best to get personal copy).

AOL search data they released a few years ago (248mb)

music brainz (4.2gb)

Census data
  US Business Census data probably from Bureau of Labor Statistics
    establishments, employees by NAICS classification (3.3mb)
  some other BLS (531mb)
  some US government employee stats (2.7mb)
  Canada census data (321mb)
  UK census partly (202kb)
  household expenditures (96kb)
  retail survery (489kb)
  susb naics (2mb)
  UN world occupation data (49mb)
 
Time Magazine covers (253mb)

USDA nutrients (54mb)

ONET Skills from US Labor Bureau - data about job types and their duties (31mb)

guardian data sets csv (416k)

harvard library metadata (2.3gb)

Computer vision data
  CIFAR 100 (200mb)
  imagenet urls (338mb)
  Faces dataset (from somewhere) (553mb)
  mirflickr 1 million images (48gb)
  mirflickr 25 (3gb)
  poselets (1gb)
  trecvid (4.8mb)
  berkeley 3d kinect (800mb)
  caltech256 (1.7gb)
  microsoft kinect gestures (541mb)
  pascal vision challenge (2.5gb)

Corp Watch (72kb)

various kinds of finance data (1gb)
  daily summaries from NASDAQ, AMEX, NYSE
  prices pulled from yahoo api
  kenneth french research
  robert schiller data
  sec data

DBpedia (4.4gb)

dmoz (400mb)

Freebase (38gb)

Movie/tv data sets
  IMDB (1.2gb)
  netflix prize (700mb)
  tvtropes (1gb)

various kaggle data sets (8.7gb)

sherlock holmes stories (14mb)

from google
  wikilinks (1.8gb)
  wikipedia crosslinks (8.5gb)

wikipedia page counts (112gb)
wikipedia (10gb)

No comments:

Post a Comment