Generating variables

Cloning existing variables

I prefer to keep the orignal dataset untouched, so I would usually create a copy of the variables that I’m interested in, and work with the copy. There are two ways to do this:

  • clonevar orignal_varName clone_varName (preferred)
    • Exact clone, including data values, labels etc.
  • gen orignal_varName clone_varName or generate
    • Only clones the data, not labels

Let’s try using the World Value Survey (Wave 6) data. And make a copy of V10, a question about subjective happiness.

use WV6_Data.dta, clear

gen happiness = V10
codebook happiness V10, compact


Variable     Obs Unique      Mean  Min  Max  Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness  89565      7  1.827209   -5    4  
V10        89565      7  1.827209   -5    4  Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

We see that the values for happiness (our copy) and V10 are the same, but happiness does not have any variable labels. Of course, we can always manually create labels for the new variables.

Now let’s try clonevar.

clonevar happiness = V10
codebook happiness V10, compact


Variable     Obs Unique      Mean  Min  Max  Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness  89565      7  1.827209   -5    4  Feeling of happiness
V10        89565      7  1.827209   -5    4  Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Both values and labels are preserved in our cloned copy of V10.

Creating categorical variable

Let’s create a dichotomous variable for having children (Yes/No) from the original variable that shows how many children someone has.

We can do this by recode the original variable.

gen have_children = V58
recode have_children (-5/-1 = .) (1/8 = 1)

Always check to see the recoding was done correctly.

tab V58 have_children, missing

 How many children do |          have_children
             you have |         0          1          . |     Total
----------------------+---------------------------------+----------
                   -5 |         0          0         29 |        29 
                   -4 |         0          0      1,000 |     1,000 
                   -2 |         0          0        529 |       529 
                   -1 |         0          0        109 |       109 
          No children |    26,142          0          0 |    26,142 
              1 child |         0     14,297          0 |    14,297 
           2 children |         0     21,579          0 |    21,579 
           3 children |         0     12,356          0 |    12,356 
           4 children |         0      6,292          0 |     6,292 
           5 children |         0      3,230          0 |     3,230 
           6 children |         0      1,775          0 |     1,775 
                    7 |         0        991          0 |       991 
                    8 |         0      1,236          0 |     1,236 
----------------------+---------------------------------+----------
                Total |    26,142     61,756      1,667 |    89,565 

Or, we can do the same by using replace

gen have_children = .
replace have_children = 1 if V58 > 1
replace have_children = 0 if V58 == 0

Again, check to see the if new variable was created correctly.

tab V58 have_children, missing

 How many children do |          have_children
             you have |         0          1          . |     Total
----------------------+---------------------------------+----------
                   -5 |         0          0         29 |        29 
                   -4 |         0          0      1,000 |     1,000 
                   -2 |         0          0        529 |       529 
                   -1 |         0          0        109 |       109 
          No children |    26,142          0          0 |    26,142 
              1 child |         0          0     14,297 |    14,297 
           2 children |         0     21,579          0 |    21,579 
           3 children |         0     12,356          0 |    12,356 
           4 children |         0      6,292          0 |     6,292 
           5 children |         0      3,230          0 |     3,230 
           6 children |         0      1,775          0 |     1,775 
                    7 |         0        991          0 |       991 
                    8 |         0      1,236          0 |     1,236 
----------------------+---------------------------------+----------
                Total |    26,142     47,459     15,964 |    89,565