Economics: Databases for Data

Need Help?

Ask us!

About finding data

Strategies:

Below is a link to a great infographic describing the issues involved in "finding data".  In brief, the steps are:

  1. Assess your data needs.
  2. Ask who cares about the data.
  3. Search through different paths (be flexible and persistent ...)

Google Dataset Search:

Google also has a neat Dataset Search engine!  Be aware that it does not necessarily GIVE you the data (though it can), but it tells you where that data are available, sometimes with a teaser snippet of data. Once you enter a keyword(s), use the filters on the results: An especially useful filter is the Free filter.

Advice: have a separate browser tab/window open for your library's data database offerings:  then when Google Dataset Search tells you the data are available in database XYZ, a database available in your library, you're set!

Time series data databases

Some of these databases can also be used to find cross-sectional data ...

See here for more information about classification systems used to organize trade and industrial production data.

Top Databases:
Additional Databases:

Cross-sectional data sources

In addition to the databases above, there is the ICPSR archive of datasets ...

The Inter-University Consortium for Political and Social Research (ICPSR) holds a wealth of data from a number of sources.  Here are some highlights:

  • National Archive of Computerized Data on Aging
  • National Archive of Criminal Justice
  • Health and Medical Care Archive
  • Substance Abuse and Mental Health Data Archive
  • International Archive of Education Data
  • 2000 Census
  • Collaborative Psychiatric Epidemiology Surveys
  • Crimestat
  • Data Preservation Alliance for the Social Sciences
  • Data Sharing for Demographic Research
  • General Social Survey Data and Retrieval Information System
  • Homicide Research Working Group
  • Human Subject Protection and Disclosure Risk Analysis
  • Mexican American Trajectories: Family, Geography, and Intermarriage across a Century
  • Minority Data Resource Center
  • Population and Environment in the U.S. Great Plains
  • Project on Human Development in Chicago Neighborhoods
  • Terrorism & Preparedness Data Resource Center

Please be aware that downloading the data is not immediate (you must create an account and get approval), and there is a substantial learning curve for some of these datasets.

Statistical compendia or compilations

Ok, so you've tried the databases above, and had no luck ... Either there was no data at all, or it didn't fit the model you're building.

Finding the source: 

The first thing to do is try to find out who (i.e. what organization) might collect data similar to what you need.  The following statistical compilations can be very helpful for this:

Look for footnotes and bibliographies:

Assuming that you find some relevant data in one of these compendia, make note of the source of the data:  usually you'll find this in a footnote.  With that information, we can look further.  Please contact me for individual help on this ...

Related variables:

Be aware that sometimes, because the data you want simply is not collected in a systematic fashion, you'll have to find a related variable (e.g. instead of using sales volume / value for athletic shoes, using a consumer price index value (CPI) for clothing).

Wrong interval or periodicity?

Scenario 1:  You find monthly data (small interval), but want quarterly (larger interval)
Scenario 2:  You find annual data (large interval), but want monthly data (small interval) 

Converting a small interval dataset to a larger interval is possible, and some of the database interfaces can even do it for you (e.g. Global Insight).  However, if you found a large interval dataset (say, annual), but wanted a smaller interval dataset (e.g. monthly or quarterly), check first with your professor to see if they think this will be acceptable for your model.  Often it will not ...

For this problem (finding large interval data when you want small interval data), there may not be a solution.  Whoever collects the data decides how often they'll collect it, and that's that.  Sometimes if you can contact the source you will learn if they have the data in other intervals ... but it may not be published, or may not be available for free.

Free sources:

Click here for links to sources for free time series ...

What are time series data?

Census year Population
2010 308,745,538
2000 281,424,603
1990 248,709,873
1980 226,542,199
1970 203,302,031
1960 179,323,175
1950 151,325,798
Data source: US Census Bureau. Statistical Abstract of the United States, 2012

Differences (change) through time

It's a set of data for one variable collected through time.  An easy example illustrating this is the US population census:  every ten years, the Census Bureau counts (or tries to!) all people living in the United States.  So there's a data point every ten years that tells us how many people were in the country going all the way back to 1790.  So a small sample of the time series looks like the table to the right:

Of course, the census is an unusual time series in that it is only collected every ten years (there are population estimates done every year, but the collection methodology is different). More commonly, time series data are collected on a monthly, quarterly, or annual basis. The important things to know about time series are:

 –   the data are collected on a regular schedule
 –   the data are collected always in the same way
        ?   (or there is lots of documentation for why, how and when the collection method was changed!)
 –   the data are usually collected by the same people (or institution)

What are cross-sectional data?

Differences within a group

Whereas time series data allow the researcher to look at one variable through time (the infant mortality rate in the US from 1960 - present), cross-sectional data let you look at many variables at a single point in time.  For example, what were the infant mortality rates in North and South American countries in 2005 (or the latest available year)?  Cross-sectional data allow the researcher to compare differences within a larger group at one point in time. 

Below is an example of data you might collect in order to examine how the states in the Midwest fund their highways.

State IL IN IA MI MN OH WI
Population, in 000s 12,910 6,423 3,008 9,970 5,266 11,543 5,655
Highway mileage 139,577 95,679 114,347 121,651 137,932 123,024 114,910
State gvt hwy spending, in mil. $ 5,385 3,280 1,721 3,178 2,365 4,852 2,549
Fed hwy trust funds, in mil. $ 1,370 942 453 1,100 563 1,181 806
Fed hwy trust funds, $ per capita 106 147 151 110 107 102 143
Gas tax rate (cents/gal.) 19 18 21 19 27.1 28 30.9
All data are for 2009, and from the 2012 Statistical Abstract of the United States.