Below is a link to a great infographic describing the issues involved in "finding data". In brief, the steps are:
Google also has a neat Dataset Search engine! Be aware that it does not necessarily GIVE you the data (though it can), but it tells you where that data are available, sometimes with a teaser snippet of data. Once you enter a keyword(s), use the filters on the results: An especially useful filter is the Free filter.
Advice: have a separate browser tab/window open for your library's data database offerings: then when Google Dataset Search tells you the data are available in database XYZ, a database available in your library, you're set!
Some of these databases can also be used to find cross-sectional data ...
In addition to the databases above, there is the ICPSR archive of datasets ...
The Inter-University Consortium for Political and Social Research (ICPSR) holds a wealth of data from a number of sources. Here are some highlights:
|
|
Please be aware that downloading the data is not immediate (you must create an account and get approval), and there is a substantial learning curve for some of these datasets.
Ok, so you've tried the databases above, and had no luck ... Either there was no data at all, or it didn't fit the model you're building.
The first thing to do is try to find out who (i.e. what organization) might collect data similar to what you need. The following statistical compilations can be very helpful for this:
Assuming that you find some relevant data in one of these compendia, make note of the source of the data: usually you'll find this in a footnote. With that information, we can look further. Please contact me for individual help on this ...
Be aware that sometimes, because the data you want simply is not collected in a systematic fashion, you'll have to find a related variable (e.g. instead of using sales volume / value for athletic shoes, using a consumer price index value (CPI) for clothing).
Scenario 1: You find monthly data (small interval), but want quarterly (larger interval)
Scenario 2: You find annual data (large interval), but want monthly data (small interval)
Converting a small interval dataset to a larger interval is possible, and some of the database interfaces can even do it for you (e.g. Global Insight). However, if you found a large interval dataset (say, annual), but wanted a smaller interval dataset (e.g. monthly or quarterly), check first with your professor to see if they think this will be acceptable for your model. Often it will not ...
For this problem (finding large interval data when you want small interval data), there may not be a solution. Whoever collects the data decides how often they'll collect it, and that's that. Sometimes if you can contact the source you will learn if they have the data in other intervals ... but it may not be published, or may not be available for free.
Census year | Population |
---|---|
2010 | 308,745,538 |
2000 | 281,424,603 |
1990 | 248,709,873 |
1980 | 226,542,199 |
1970 | 203,302,031 |
1960 | 179,323,175 |
1950 | 151,325,798 |
Data source: US Census Bureau. Statistical Abstract of the United States, 2012 |
It's a set of data for one variable collected through time. An easy example illustrating this is the US population census: every ten years, the Census Bureau counts (or tries to!) all people living in the United States. So there's a data point every ten years that tells us how many people were in the country going all the way back to 1790. So a small sample of the time series looks like the table to the right:
Of course, the census is an unusual time series in that it is only collected every ten years (there are population estimates done every year, but the collection methodology is different). More commonly, time series data are collected on a monthly, quarterly, or annual basis. The important things to know about time series are:
– the data are collected on a regular schedule
– the data are collected always in the same way
? (or there is lots of documentation for why, how and when the collection method was changed!)
– the data are usually collected by the same people (or institution)
Whereas time series data allow the researcher to look at one variable through time (the infant mortality rate in the US from 1960 - present), cross-sectional data let you look at many variables at a single point in time. For example, what were the infant mortality rates in North and South American countries in 2005 (or the latest available year)? Cross-sectional data allow the researcher to compare differences within a larger group at one point in time.
Below is an example of data you might collect in order to examine how the states in the Midwest fund their highways.
State | IL | IN | IA | MI | MN | OH | WI |
---|---|---|---|---|---|---|---|
Population, in 000s | 12,910 | 6,423 | 3,008 | 9,970 | 5,266 | 11,543 | 5,655 |
Highway mileage | 139,577 | 95,679 | 114,347 | 121,651 | 137,932 | 123,024 | 114,910 |
State gvt hwy spending, in mil. $ | 5,385 | 3,280 | 1,721 | 3,178 | 2,365 | 4,852 | 2,549 |
Fed hwy trust funds, in mil. $ | 1,370 | 942 | 453 | 1,100 | 563 | 1,181 | 806 |
Fed hwy trust funds, $ per capita | 106 | 147 | 151 | 110 | 107 | 102 | 143 |
Gas tax rate (cents/gal.) | 19 | 18 | 21 | 19 | 27.1 | 28 | 30.9 |
All data are for 2009, and from the 2012 Statistical Abstract of the United States. |