Popular secondary data sources that you can use for your dissertation research


Introduction

Data. If you are writing a thesis or doing scientific research, and you want to perform some type of statistical analysis, then you will need data to answer your research question. However, it’s not always obvious where you can find the right data for the job. Therefore, a question that I am often asked by students is: “Where can I find data for my thesis?”. In this post, I have collected a list of secondary data sources that may help you to find the the right data for your own research. I have divided them in the following categories:

Surveys
Databases
Project data
Search engines
Journal data
Researcher data
Other lists

Not familiar with secondary data? Let’s start with a quick introduction!

Primary and secondary data

Let’s first make a distinction between two types of data. Primary data is data that you collect yourself, for example by doing an experiment or by administering surveys. Secondary data is data that has already been collected for you. For example, by a National Statistics Bureau of a certain country, or by an already existing survey such as the World Values Survey. Secondary data can also be data that is stored in an existing database, such as the Genetic Variation data from the European Variation Archive.

Okay, but I was looking for primary data?

In the remainder of this post, I will talk about secondary data. However, I would like to emphasize that nowadays (primary) data can be found everywhere. In fact, I encourage you to take advantage of the many sources of information that are available: whether you collect data yourself by crawling through Twitter, design a simple online survey using Google Forms, by using computer game statistics, or by analyzing Youtube videos.1 Data is everywhere!

Surveys

A survey is a set of questions that aim to measure one or multiple topics. Usually, surveys are designed to gauge the opinions or preferences of individuals on a certain matter, but more objective questions (such as “How much do you earn per hour?”) are often also included.2

Although surveys are increasingly administered digitally, many global surveys still rely on researchers that go ‘out in the field’ to administer surveys by interviewing respondents.3 After a survey is administered, the data is collected in a dataset. This makes using this type of data very easy, because all the hard work of compiling and cleaning the survey data has already been done for you.

survey

Surveys: examples

Below is a list of popular surveys that may help you with your thesis research. Most datasets are freely accessible, for others you have to register an account on the website of the survey to be able to access the data.

  • World Values Survey (WVS): one of the first global surveys, it mostly contains information on people’s perceptions and values with regards to politics, the economy, and life in general.
  • European Values Study (EVS): similar to the WVS, however it only includes European countries.
  • Panel study of Income Dynamics (PSID): an ongoing survey of about 5,000 families from the United States, starting in 1968. As the title suggests it contains data on the economic background of individuals, but also a lot more: child rearing, marriage, education, and so forth.
  • German Socio-Economic Panel (G-SOEP): similar to the PSID, however it focuses exclusively on Germany.
  • Understanding Society: similar to the PSID, however it focuses exclusively on the United Kingdom.
  • PEW Research Center Surveys: includes surveys on United States politics and public opinion.
  • The DHS Program: the Demographic and Health Surveys (DHS) Program collects data on health, diseases and related topics (such as nutrition) for many countries in the world.
  • The Gender and Generations Program: focuses on studying family- and partner relationships. Individuals from multiple countries are surveyed, predominantly from Europe.

Tip: national surveys

For some countries, multiple surveys are available. For example, in the United States alone there are about 15 different surveys on households!

Back to Introduction

Databases

Databases collect and store data from other sources. This may be data from national statistics offices, surveys, scientific articles, and so forth. Because most databases offer large amounts of data, and because you can often easily search this data, they are a great source to use for your dissertation.

GEM database
Example: the Grid-Enabled Measures GEM database, which contains many datasets, mostly from the Medical Sciences and Psychology.
Database: examples

Below is a list of popular databases that are worth browsing for your thesis. Most datasets are freely accessible, for others you have to register an account on the website of the database to be able to access the data.

  • World Bank: offers country data for the period 1975-today on social, political and economic variables such as the literacy rate, quality of government and GDP per capita.
  • UN Data: similar to the World Bank database.
  • Quality of Government (QoG): the Quality of Government dataset from the University of Gothenborg offers national (country) and regional data on various governance indicators. Furthermore, the national data file includes popular metrics from other surveys (such as the World Values Survey) and variables from popular scientific papers (example of data from a paper).
  • EuroStat: offers European data (national and regional) on social, political and economic indicators.
  • Historical Statistics: a database with historical financial, economic, and social datasets.
  • Correlates of War: extensive database on the history of war and related topics, such as military disputes, formal alliances, diplomatic connections and bilateral trade.
  • Economagic: Economic and Financial timeseries data for many countries in the world, on various topics (unemployment, GDP growth, interest rates, and so forth).
  • Federal Reserve Economic Data (FRED): similar to Economagic.
  • Penn World Tables: similar to Economagic, but focuses on productivity.
  • Luxembourg Income Study Database: a database of household and person-level income data for various countries. Also check the Luxembourg Wealth Study Database, which is similar to the Income study, but focuses on wealth (i.e. assets, debt) instead.
  • CIA World Fact Book: a database with various country characteristics (e.g. level of democracy, geography, military status). Tip: the original data is only available in print, but datasets containing CIA World Factbook data have been compiled by others. Part of the data is also included in the previously mentioned QoG.
  • Comparative Political Data Set (CPDS): country-level data on various political and institutional indicators.

Tip: national databases

Most countries have national statistics offices. Usually, these collect data on various topics for a specific country, which makes them a great source if you are interested in studying a single country. A list of national statistics offices can be found here.

Back to Introduction

Project data

Some scientific problems are too difficult to tackle all alone. This is why researchers collaborate in scientific projects. Often, the data from these projects is freely accessible. Below are some examples.

Tip: re3data

The Registry of Research Data Repositories (r3data) collects information on project data for studies from all disciplines. Definitely worth a look!

Back to Introduction


Where and how should you discuss data in your thesis?

Which sections should I include in my Bachelor’s or Master’s thesis

Course – How to finish your thesis

Increasingly, publicly available datasets are being indexed by search engines. These allow you to quickly find interesting data. Rather than being a full-fledged database, these search engines direct you to the place where the data can be retrieved. Some examples:

  • Google Dataset Search: Launched in 2018, this Google service aims to index as many datasets as possible. Combined with the ‘traditional’ Google search functionality which most of us are familiar with, this is a good place to start.
  • Quandl: a search engine connected to millions of financial, economic, and social datasets. Not all data is freely accessible.
  • Datahub.io: similar to Quandl.
  • Plenar.io: similar to Quandl, although most data is freely accessible.
  • Data.world: similar to Quandl, but with an added bonus: it categorizes data based on academic discipline. For example, say you were interested in browsing Psychology datasets.
  • Socrata Open data: a search engine that focuses exclusively on (open) Government data.
  • Nation Master: a site that aggregates data on a multitude of ’popular’ topics, ranging from birth rates to crime statistics and alcohol consumption. Covers many countries in the world.
  • Statista: similar to Nation Master, however it focuses on market and industry data (e.g. number of sold smart-phones, market size of cosmetics industry, etc.).

Journal data

Scientific journals often demand that researchers make their dataset publicly available before an article is accepted and published. Usually, this dataset is posted on the website of the journal. Therefore, it may pay off to check a journal’s website to see if an interesting dataset is available.

Did you read an interesting article? Check the Appendix of the article to see whether data has been made available online. If data has been made available, the article will usually say something along the lines of: “Our data is available from the online appendix of Journal A“.

Journal of Marketing Research
Example: articles published in the Journal of Marketing Research (JMR) often contain links to the used datasets (called ‘supplemental materials’ or ‘online appendix’).
Tip: journals

Additionally, scientific journals often maintain lists to various (publicly available) datasets, for example:

Tip: Open data

In the last decade, many journals have started to focus on sharing research data. These are called ‘open data’ journals. Examples are the Journal of Open Psychology Data or the Open Data Journal for Agricultural Research .

Back to Introduction

Researcher data

Many researchers share data that they use for scientific publications on their personal (University) web page. I will list just a few examples here.

  • Prof. Enrico Spolaore: A professor of Economics who has researched the economic consequences of genetic and cultural distances between countries.5
  • Prof. Robert Putnam: World-renown Sociologist Robert Putnam shares his research data (in this example: on social capital) on his website.
  • Vanderbilt Biostatistics Data: some university research groups, such as the Biostatistics group at Vanderbilt University, post datasets on their website.

Tip: visit researcher website

I personally think it is always a good idea to visit the website of a researcher or professor that you are reading an interesting article from: next to finding links to other useful articles and perhaps some data, it will also tell you if they have made any recent advancements in the study of a topic!

Other lists

Obviously, I am not the first to compile a list of datasets. Below are some other examples that you may find helpful.

Tables & figures

You are at the end of this post, so by now I’m pretty convinced that you will use data in your thesis . If yes, then you also need to show this data: in tables, and in figures. I have written a post on this topic:

How to format tables in your dissertation

How to format figures and plots in your thesis

Concluding words

I sincerely hope that this post has helped you discover new and exciting datasets for your thesis research. Thanks for reading, and good luck with your research!

Back to Introduction

Join ThesisCore