Cloning existing variables
I prefer to keep the orignal dataset untouched, so I would usually create a copy of the variables that I’m interested in, and work with the copy. There are two ways to do this:
clonevar orignal_varName clone_varName(preferred)- Exact clone, including data values, labels etc.
gen orignal_varName clone_varNameorgenerate- Only clones the data, not labels
Let’s try using the World Value Survey (Wave 6) data. And make a copy of V10, a question about subjective happiness.
use WV6_Data.dta, clear
gen happiness = V10
codebook happiness V10, compact
Variable Obs Unique Mean Min Max Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness 89565 7 1.827209 -5 4
V10 89565 7 1.827209 -5 4 Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
We see that the values for happiness (our copy) and V10 are the same, but happiness does not have any variable labels. Of course, we can always manually create labels for the new variables.
Now let’s try clonevar.
clonevar happiness = V10
codebook happiness V10, compact
Variable Obs Unique Mean Min Max Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness 89565 7 1.827209 -5 4 Feeling of happiness
V10 89565 7 1.827209 -5 4 Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Both values and labels are preserved in our cloned copy of V10.
Creating categorical variable
Let’s create a dichotomous variable for having children (Yes/No) from the original variable that shows how many children someone has.
We can do this by recode the original variable.
gen have_children = V58
recode have_children (-5/-1 = .) (1/8 = 1)
Always check to see the recoding was done correctly.
tab V58 have_children, missing
How many children do | have_children
you have | 0 1 . | Total
----------------------+---------------------------------+----------
-5 | 0 0 29 | 29
-4 | 0 0 1,000 | 1,000
-2 | 0 0 529 | 529
-1 | 0 0 109 | 109
No children | 26,142 0 0 | 26,142
1 child | 0 14,297 0 | 14,297
2 children | 0 21,579 0 | 21,579
3 children | 0 12,356 0 | 12,356
4 children | 0 6,292 0 | 6,292
5 children | 0 3,230 0 | 3,230
6 children | 0 1,775 0 | 1,775
7 | 0 991 0 | 991
8 | 0 1,236 0 | 1,236
----------------------+---------------------------------+----------
Total | 26,142 61,756 1,667 | 89,565
Or, we can do the same by using replace
gen have_children = .
replace have_children = 1 if V58 > 1
replace have_children = 0 if V58 == 0
Again, check to see the if new variable was created correctly.
tab V58 have_children, missing
How many children do | have_children
you have | 0 1 . | Total
----------------------+---------------------------------+----------
-5 | 0 0 29 | 29
-4 | 0 0 1,000 | 1,000
-2 | 0 0 529 | 529
-1 | 0 0 109 | 109
No children | 26,142 0 0 | 26,142
1 child | 0 0 14,297 | 14,297
2 children | 0 21,579 0 | 21,579
3 children | 0 12,356 0 | 12,356
4 children | 0 6,292 0 | 6,292
5 children | 0 3,230 0 | 3,230
6 children | 0 1,775 0 | 1,775
7 | 0 991 0 | 991
8 | 0 1,236 0 | 1,236
----------------------+---------------------------------+----------
Total | 26,142 47,459 15,964 | 89,565