"C:/Users/User/Documents/Recon/EOSE09/stata_files/" # set your directory
cd
use regional_dataset, clear
12.0fc # 12 numbers left of the decimal point; 0 to the right; commas to denote thousands format regional_gdp_cap_1990 %
Lab 1 Exercises
Purpose
π Stata is a powerful software for data analysis and visualization. This exercise set aims to showcase the capabilities of Stata in creating informative graphics π. We will be using data on the UK and Sweden π¬π§πΈπͺ to demonstrate the process of making a map πΊοΈ, a combined plot π, and a summary table π. The purpose of this accompanying post is to provide a step-by-step guide π‘ on how to use Stataβs built-in tools π οΈ to create effective visualizations π¨ that help communicate insights from your data π.
Get started
As before, we want to load our data from the regional_dataset.dta
file.
Question 1
β’ Map of GDP per capita in Sweden and the United Kingdom in 1990
There are two ways to do this.
Method 1: use their locations on a map
First we create a variable for if the country is Sweden or the UK.
Then we use this vairable to create the map with an if
statement.
Note that you need to change your breaks in order to display the full range of data.
= .
gen uk_sv = 1 if country == "Sweden" | country == "United Kingdom"
replace uk_sv
"nutscoord.dta" if year == 1990 & uk_sv == 1, id(_ID) fcolor(Blues2) legend(pos(9)) legstyle(2)
spmap regional_gdp_cap_1990 using title("Regional GDP per Capita - 1990", size(medium))
clmethod(custom) clbreaks(0 12000 15000 16000 17000 18000 21000)
Method 2: graph combine
Here we make two graphs with the same scale, but only the graph for Sweden includes a legend.
The UK map has the legend suppressed with the leg(off)
command.
Then we name each, and combine it with a title in the final step.
"nutscoord.dta" if year == 1990 & country == "United Kingdom", id(_ID) fcolor(Blues2) ///
spmap regional_gdp_cap_1990 using title("United Kingdom", size(medium)) ///
name(UK_GDP_1990, replace) ///
leg(off)
"nutscoord.dta" if year == 1990 & country == "Sweden", id(_ID) fcolor(Blues2) legend(pos(10)) legstyle(2) ///
spmap regional_gdp_cap_1990 using title("Sweden", size(medium)) ///
clmethod(custom) clbreaks(0 12000 15000 16000 17000 18000 22000) ///
name(SV_GDP_1990, replace)
graphregion(color(white)) ///
graph combine UK_GDP_1990 SV_GDP_1990, title("Regional GDP per Capita - 1990") ///
note(Source: RosΓ©s-Wolf (2020), size(small) position(5)) ///
scheme(s2mono)
Question 2
β’ Map of share of employment in industry in 2010 across the whole dataset
First we must ask where the cutoffs should be in the legend.
We can draw a density plot to find out.
if year == 2010 kdensity employment_share_industry
Next we use this information to create breaks between 10 percent and 40 percent, where the bulk of the distribution is.
"nutscoord.dta" if year == 2010, id(_ID) fcolor(Greens) legstyle(2) ///
spmap employment_share_industry using title("Employment Share Industry - 2010", size(large)) ///
osize(0.02 ..) ocolor(white ..) ///
clmethod(custom) clbreaks(0 (0.1) 0.5) ///
legend(pos(9) size(medium) rowgap(1.5) label(6 "40-50 %") label(5 "30-40 %") ///
label(4 "20-30 %") label(3 "10-20 %") label(2 "0-10 %") label(1 "No Data")) ///
ndfcolor(gray) ndocolor(white ..) ndsize(0.02 ..)
Question 3
β’ Map of share of employment in services in 1950 in Scandinavia
Generate variable for Scandnavian countries
= .
gen scandinavia = 1 if country == "Sweden" | country == "Denmark" | country == "Norway" replace scandinavia
Check where the weight of the distribution is
if year == 1950 & scandinavia == 1 kdensity employment_share_services
"nutscoord.dta" if year == 1950 & scandinavia == 1, id(_ID) fcolor(Reds) legstyle(2) ///
spmap employment_share_services using title("Employment Share Services - 1950", size(large)) ///
osize(0.02 ..) ocolor(white ..) ///
clmethod(custom) clbreaks(0 (0.1) .7) ///
legend(pos(9) size(medium) rowgap(1.5) label(8 "60-70 %") label(7 "50-60 %") ///
label(6 "40-50 %") label(5 "30-40 %") ///
label(4 "20-30 %") label(3 "10-20 %") label(2 "0-10 %") label(1 "No Data")) ///
ndfcolor(gray) ndocolor(white ..) ndsize(0.02 ..)
Question 4
Scatterplot of GDP per capita in 1990 dollars on the x-axis and share of employment in services on the y-axis for the year 2015. Make the colour of the points on the scatter plot show what country it is.
Here we need an additional package to help us.
ssc install sepscatter
Basic scatter plot
if year == 2015, separate(country) sepscatter regional_gdp_cap_1990 employment_share_services
Add in linear regression line
if year == 2015, separate(country) addplot(lfit regional_gdp_cap_1990 employment_share_services if regional_gdp_cap_1990 > 0.5) sepscatter regional_gdp_cap_1990 employment_share_services
Question 5
β’ Make a table of the mean GDP per capita by country in 2000. Export it with the Outreg2 command.
Remember how we made a national GDP per capita variable
: egen national_gdp_1990 = total(regional_gdp_1990)
bysort country year: egen national_population = total(regional_population)
bysort country year= national_gdp_1990 / national_population gen national_gdp_cap_1990
Now format this number
12.0fc # 12 numbers left of the decimal point; 0 to the right; commas to denote thousands format national_gdp_cap_1990 %
We could do this with Outreg2. We can also just keep the data that we want, and export it as a csv file. This is quick and dirty, but it works.
keep country national_gdp_cap_1990 year
if year == 2000
keep
duplicates drop
export delimited using national_gdp_cap_1990.csv, replace
Open this file in excel Go to the data tab, select text to columns, delimited, comma, and then format it.
We can make a nice little bar graph with in Excel too.