<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Political Science Research Methods on Political Science Research Methods</title>
    <link>https://fanghuiz.github.io/ps0700/</link>
    <description>Recent content in Political Science Research Methods on Political Science Research Methods</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-us</language>
    <copyright>&amp;copy; Fanghui Zhao 2019 &lt;i class=&#34;fas fa-tree&#34; style=&#34;color: #40a990;&#34;&gt;&lt;/i&gt;</copyright>
    <lastBuildDate>Sun, 15 Oct 2017 00:00:00 -0400</lastBuildDate>
    <atom:link href="/ps0700/" rel="self" type="application/rss+xml" />
    
    <item>
      <title>When Does Box Plot Hide Information?</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-03-23-boxplot/</link>
      <pubDate>Fri, 22 Mar 2019 00:00:00 -0400</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-03-23-boxplot/</guid>
      <description>&lt;!-- ```stata
histogram Fox_dnone
``` --&gt;

&lt;!-- Histogram and box plots are both good visual ways to quickly get a feel of the distribution of our data.

For histograms, we can overlay a normal density curve on top of the hisorgram to see if the distribution is approximatley symmetrical or skewed. --&gt;

&lt;p&gt;Box plot is a powerful way to visualize the distribution of a continuous variable. However, it hides crucial information when our data is not uni-modal (i.e. has more than one peak in the distribution).&lt;/p&gt;

&lt;p&gt;Box plot is a very information-rich. From the graph, we can see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The median value, as shown by the bar in the middle.&lt;/li&gt;
&lt;li&gt;The inter-quartile range, shown by the total length of the box.&lt;/li&gt;
&lt;li&gt;The 1st quartile (25th percentile) and the 3rd quartile (75th percentile), indicated respectively by the lower boundary and the upper boundary of the box.&lt;/li&gt;
&lt;li&gt;Outlier values, as indicated by individual dots plotted outside of the whiskers range.&lt;/li&gt;
&lt;li&gt;The approximate degree of dispersion in the data, shown by the length of the box

&lt;ul&gt;
&lt;li&gt;Shorter box indicates a smaller variance in the data, and longer box indicates a larger variance.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;Whether the distribution is symmetrical or skewed

&lt;ul&gt;
&lt;li&gt;If the position of the median bar is closer to the middle, then the distribution is approximately symmetrical; and if the bar is positioned towards the side, then the distribution is skewed.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/govtExp_box.svg&#34; alt=&#34;Symmetrical distribution; Relatively low variance; Outlier value&#34; /&gt;



&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  
  &lt;p&gt;
    Symmetrical distribution; Relatively low variance; Outlier value
    
    
    
  &lt;/p&gt; 
&lt;/figcaption&gt;

&lt;/figure&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/lifeExp_female_box.svg&#34; alt=&#34;Skewed distribution; Slightly higher variance; No outlier&#34; /&gt;



&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  
  &lt;p&gt;
    Skewed distribution; Slightly higher variance; No outlier
    
    
    
  &lt;/p&gt; 
&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;However, box plot has one drawback &amp;ndash; it hides the shape of the distribution if our data is &lt;strong&gt;bi-modal&lt;/strong&gt; (or multi-modal).&lt;/p&gt;

&lt;p&gt;For example, here we have some data that has a bi-modal distribution &amp;ndash; the size of the Christian population as a percentage of a country&amp;rsquo;s total population.&lt;/p&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/pctChristian_hist.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;If we draw a box plot for this data, this bi-modal property is completely hidden.&lt;/p&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/pctChristian_box.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;So if our data has more than one peak, then box plot would not be the most appropriate graph to display the distrbution shape. Good old histogram is a better choice in this context.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Generating variables</title>
      <link>https://fanghuiz.github.io/ps0700/tutorial_stata/2_2_data_manipulation/</link>
      <pubDate>Wed, 20 Mar 2019 00:00:00 +0000</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/tutorial_stata/2_2_data_manipulation/</guid>
      <description>

&lt;!-- ```
use anes_timeseries_2016.dta, clear
```


&lt;!-- The goal of data manipulation/processing is to get the data ready for analysis. This stage could take up a of time, depending on how &#34;processed&#34; your dataset is when you gained accessed to it.  --&gt;

&lt;!-- ```
. 
. 
``` --&gt;

&lt;h2 id=&#34;cloning-existing-variables&#34;&gt;Cloning existing variables&lt;/h2&gt;

&lt;p&gt;I prefer to keep the orignal dataset untouched, so I would usually create a copy of the variables that I&amp;rsquo;m interested in, and work with the copy. There are two ways to do this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;clonevar orignal_varName clone_varName&lt;/code&gt; (preferred)

&lt;ul&gt;
&lt;li&gt;Exact clone, including data values, labels etc.&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gen orignal_varName clone_varName&lt;/code&gt; or &lt;code&gt;generate&lt;/code&gt;

&lt;ul&gt;
&lt;li&gt;Only clones the data, not labels&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let&amp;rsquo;s try using the World Value Survey (Wave 6) data. And make a copy of &lt;code&gt;V10&lt;/code&gt;, a question about subjective happiness.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;use WV6_Data.dta, clear

gen happiness = V10
codebook happiness V10, compact
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;

Variable     Obs Unique      Mean  Min  Max  Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness  89565      7  1.827209   -5    4  
V10        89565      7  1.827209   -5    4  Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We see that the values for &lt;code&gt;happiness&lt;/code&gt; (our copy) and &lt;code&gt;V10&lt;/code&gt; are the same, but &lt;code&gt;happiness&lt;/code&gt; does not have any variable labels. Of course, we can always &lt;a href=&#34;https://fanghuiz.github.io/ps0700/tutorial_stata/2_3_data_manipulation/&#34; target=&#34;_blank&#34;&gt;manually create labels&lt;/a&gt; for the new variables.&lt;/p&gt;

&lt;p&gt;Now let&amp;rsquo;s try &lt;code&gt;clonevar&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;clonevar happiness = V10
codebook happiness V10, compact
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;

Variable     Obs Unique      Mean  Min  Max  Label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness  89565      7  1.827209   -5    4  Feeling of happiness
V10        89565      7  1.827209   -5    4  Feeling of happiness
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Both values and labels are preserved in our cloned copy of &lt;code&gt;V10&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&#34;creating-categorical-variable&#34;&gt;Creating categorical variable&lt;/h2&gt;

&lt;p&gt;Let&amp;rsquo;s create a dichotomous variable for having children (Yes/No) from the original variable that shows how many children someone has.&lt;/p&gt;

&lt;p&gt;We can do this by &lt;code&gt;recode&lt;/code&gt; the original variable.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;gen have_children = V58
recode have_children (-5/-1 = .) (1/8 = 1)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Always check to see the recoding was done correctly.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;tab V58 have_children, missing
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;
 How many children do |          have_children
             you have |         0          1          . |     Total
----------------------+---------------------------------+----------
                   -5 |         0          0         29 |        29 
                   -4 |         0          0      1,000 |     1,000 
                   -2 |         0          0        529 |       529 
                   -1 |         0          0        109 |       109 
          No children |    26,142          0          0 |    26,142 
              1 child |         0     14,297          0 |    14,297 
           2 children |         0     21,579          0 |    21,579 
           3 children |         0     12,356          0 |    12,356 
           4 children |         0      6,292          0 |     6,292 
           5 children |         0      3,230          0 |     3,230 
           6 children |         0      1,775          0 |     1,775 
                    7 |         0        991          0 |       991 
                    8 |         0      1,236          0 |     1,236 
----------------------+---------------------------------+----------
                Total |    26,142     61,756      1,667 |    89,565 

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, we can do the same by using &lt;code&gt;replace&lt;/code&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;gen have_children = .
replace have_children = 1 if V58 &amp;gt; 1
replace have_children = 0 if V58 == 0
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Again, check to see the if new variable was created correctly.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;tab V58 have_children, missing
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;
 How many children do |          have_children
             you have |         0          1          . |     Total
----------------------+---------------------------------+----------
                   -5 |         0          0         29 |        29 
                   -4 |         0          0      1,000 |     1,000 
                   -2 |         0          0        529 |       529 
                   -1 |         0          0        109 |       109 
          No children |    26,142          0          0 |    26,142 
              1 child |         0          0     14,297 |    14,297 
           2 children |         0     21,579          0 |    21,579 
           3 children |         0     12,356          0 |    12,356 
           4 children |         0      6,292          0 |     6,292 
           5 children |         0      3,230          0 |     3,230 
           6 children |         0      1,775          0 |     1,775 
                    7 |         0        991          0 |       991 
                    8 |         0      1,236          0 |     1,236 
----------------------+---------------------------------+----------
                Total |    26,142     47,459     15,964 |    89,565 

&lt;/code&gt;&lt;/pre&gt;

&lt;!-- ## Additional resources --&gt;
</description>
    </item>
    
    <item>
      <title>Labeling variables</title>
      <link>https://fanghuiz.github.io/ps0700/tutorial_stata/2_3_data_manipulation/</link>
      <pubDate>Wed, 20 Mar 2019 00:00:00 +0000</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/tutorial_stata/2_3_data_manipulation/</guid>
      <description>

&lt;!-- ```
use anes_timeseries_2016.dta, clear
```


&lt;!-- The goal of data manipulation/processing is to get the data ready for analysis. This stage could take up a of time, depending on how &#34;processed&#34; your dataset is when you gained accessed to it.  --&gt;

&lt;!-- ```
. 
. 
``` --&gt;

&lt;h2 id=&#34;variable-label&#34;&gt;Variable label&lt;/h2&gt;

&lt;p&gt;Variable label helps us to know what the variable is about. This label will also conviently shows up as axis name if we were to draw a graph,&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;describe happiness
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;
              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness       float   %9.0g                 

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can create labels to describe what the variable is measuring using &lt;code&gt;label variable var_name&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;label variable happiness &amp;quot;Feelings of happiness&amp;quot;
describe happiness
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;

              storage   display    value
variable name   type    format     label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
happiness       float   %9.0g                 Feelings of happiness

&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;value-label&#34;&gt;Value label&lt;/h2&gt;

&lt;p&gt;For categorical variables, we can create labels to show what does each level of the variable represents. This is helpful when we do a frequency table or fraw a graph.&lt;/p&gt;

&lt;p&gt;To define the labels, we first use the command &lt;code&gt;label define label_name&lt;/code&gt; to create a new label and give it a name. Then we specify the numerical value representing the category/level, then specify the label using a character string enclosed in &lt;code&gt;&amp;quot; &amp;quot;&lt;/code&gt; double quotes.&lt;/p&gt;

&lt;p&gt;Lastly, we need to apply the &lt;em&gt;label&lt;/em&gt; we have created (&lt;code&gt;happiness_label&lt;/code&gt;) to the corresponding &lt;em&gt;variable&lt;/em&gt; (&lt;code&gt;happiness&lt;/code&gt;).&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// first define the label
label define happiness_label 1 &amp;quot;Very Happy&amp;quot; 2 &amp;quot;Rather Happy&amp;quot; 3 &amp;quot;Not very happy&amp;quot; 4 &amp;quot;Not at all happy&amp;quot;

// then apply the label to the variable
label values happiness happiness_label

tab happiness
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;


     Feelings of |
       happiness |      Freq.     Percent        Cum.
-----------------+-----------------------------------
              -5 |          6        0.01        0.01
              -2 |        238        0.27        0.27
              -1 |        514        0.57        0.85
      Very Happy |     29,256       32.66       33.51
    Rather Happy |     45,786       51.12       84.63
  Not very happy |     11,214       12.52       97.15
Not at all happy |      2,551        2.85      100.00
-----------------+-----------------------------------
           Total |     89,565      100.00

&lt;/code&gt;&lt;/pre&gt;
</description>
    </item>
    
    <item>
      <title>Recoding variables</title>
      <link>https://fanghuiz.github.io/ps0700/tutorial_stata/2_1_data_manipulation/</link>
      <pubDate>Wed, 20 Mar 2019 00:00:00 +0000</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/tutorial_stata/2_1_data_manipulation/</guid>
      <description>

&lt;!-- ```
use anes_timeseries_2016.dta, clear
```


&lt;!-- The goal of data manipulation/processing is to get the data ready for analysis. This stage could take up a of time, depending on how &#34;processed&#34; your dataset is when you gained accessed to it.  --&gt;

&lt;!-- ```
. 
. 
``` --&gt;

&lt;h2 id=&#34;using-recode&#34;&gt;Using &lt;code&gt;recode&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;The most frequent use of &lt;code&gt;recode&lt;/code&gt; is to recode the numbers that represent missing values to proper &amp;ldquo;missing value&amp;rdquo; as understood by Stata.&lt;/p&gt;

&lt;p&gt;Very often at the coding stage, missing values (e.g. non-response, no available data) are coded as extreme numbers such as &lt;code&gt;99&lt;/code&gt;, &lt;code&gt;-99&lt;/code&gt;. However, without telling Stata those numbers represent missing data, Stata will treat them as numerical values, which will create problems in analysis. So we need to recode those values as &lt;code&gt;.&lt;/code&gt;, which tells Stata to treat those observations as &amp;ldquo;missing&amp;rdquo;.&lt;/p&gt;

&lt;p&gt;Different datasets will have different conventions in how they initiall code the missing data, so we will need to examine the data first to determine which numbers represent missing data.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;codebook female
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
female                                                                                                                                                                                                                                                      Sex
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (int)
                 label:  V240, but 3 nonmissing values are not labeled

                 range:  [-5,2]                       units:  1
         unique values:  4                        missing .:  0/89,565

            tabulation:  Freq.   Numeric  Label
                            40        -5  
                            51        -2  No answer
                        42,723         1  
                        46,751         2  

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this case, we have missing values coded as &lt;code&gt;-5&lt;/code&gt; and &lt;code&gt;-2&lt;/code&gt;, and there are 91 observations that have missing data.&lt;/p&gt;

&lt;p&gt;To recode the values of a variable, we can use &lt;code&gt;recode var rule&lt;/code&gt;, or &lt;code&gt;recode var (rule) (rule)&lt;/code&gt;, where the syntax for &lt;code&gt;rule&lt;/code&gt; takes the form &lt;code&gt;original value = recoded value&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Recode -5 and -2 to missing value
recode female (-5 -2 = .)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Always check to see if recoding was done correctly. Use &lt;code&gt;tab var, missing&lt;/code&gt; to display a frequency table including &lt;code&gt;.&lt;/code&gt; the missing data.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;tab female, missing
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;
                Sex |      Freq.     Percent        Cum.
--------------------+-----------------------------------
                  1 |     42,723       47.70       47.70
                  2 |     46,751       52.20       99.90
                  . |         91        0.10      100.00
--------------------+-----------------------------------
              Total |     89,565      100.00

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Here we see that we no longer have &lt;code&gt;-5&lt;/code&gt; and &lt;code&gt;-2&lt;/code&gt; in the data, and all 91 missing values have been properly recoded to &lt;code&gt;.&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;We can also choose to recode the variable to something that makes more intuitive sense, or something we prefer, if the recoding does not change what the value represents.&lt;/p&gt;

&lt;p&gt;One such case is when we have a nominal variable. Since nominal variable has categories with no inherent order or ranking, we can freely change the value that represents each category, without affetcing the substantive meaning.&lt;/p&gt;

&lt;p&gt;For example, the variable &lt;code&gt;female&lt;/code&gt; initially has &lt;code&gt;1&lt;/code&gt; representing category &amp;ldquo;Male&amp;rdquo;, and &lt;code&gt;2&lt;/code&gt; representing category &amp;ldquo;Female&amp;rdquo;. Very often, it is more intuitive to code a dichotomous variable &amp;ldquo;Yes/No&amp;rdquo; as &lt;code&gt;1/0&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Recode 2 to 1, 1 to 0
recode female (2 = 1) (1 = 0)
tab female, missing
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;(female: 89474 changes made)


                Sex |      Freq.     Percent        Cum.
--------------------+-----------------------------------
                  0 |     42,723       47.70       47.70
                  1 |     46,751       52.20       99.90
                  . |         91        0.10      100.00
--------------------+-----------------------------------
              Total |     89,565      100.00

&lt;/code&gt;&lt;/pre&gt;

&lt;h2 id=&#34;using-replace&#34;&gt;Using &lt;code&gt;replace&lt;/code&gt;&lt;/h2&gt;

&lt;p&gt;Another way to recode variable is using the &lt;code&gt;replace&lt;/code&gt; command, combining with logical operators to subset the data.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Recode all negative values to missing values
replace female = . if female &amp;lt;=0

// Recode 1 to 0
replace female = 0 if female == 1

// Recode 2 to 1
replace female = 1 if female == 2

tab female, missing
&lt;/code&gt;&lt;/pre&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;(91 real changes made, 91 to missing)

(42,723 real changes made)

(46,751 real changes made)


                Sex |      Freq.     Percent        Cum.
--------------------+-----------------------------------
                  0 |     42,723       47.70       47.70
                  1 |     46,751       52.20       99.90
                  . |         91        0.10      100.00
--------------------+-----------------------------------
              Total |     89,565      100.00

&lt;/code&gt;&lt;/pre&gt;

&lt;!-- ## Additional resources --&gt;
</description>
    </item>
    
    <item>
      <title>Univariate Distribution</title>
      <link>https://fanghuiz.github.io/ps0700/tutorial_stata/3_2_eda_univariate_graph/</link>
      <pubDate>Tue, 19 Mar 2019 00:00:00 +0000</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/tutorial_stata/3_2_eda_univariate_graph/</guid>
      <description>

&lt;!-- ```
use Norris_Democracy_CrossNational_09092015.dta, clear
``` --&gt;

&lt;h2 id=&#34;bar-plot&#34;&gt;Bar plot&lt;/h2&gt;

&lt;p&gt;To draw a bar plot, we simply use the command &lt;code&gt;graph bar var&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The default setting for &lt;code&gt;graph bar&lt;/code&gt; is to set y-axis as percent. The full command behind the scene is in fact &lt;code&gt;graph bar (percent)&lt;/code&gt;, where the &lt;code&gt;percent&lt;/code&gt; option is omitted by default.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Default bar plot, percent
graph bar, over(Cheibub4Type)
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_barplot1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;We can change the default setting, and change the y-axis to frequency / count.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Frequency bar plot
graph bar (count), over(Cheibub4Type)
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;We can also rotate the graph to display horizontal bars, using &lt;code&gt;graph hbar&lt;/code&gt;. This is helpful when we want to plot a variable with many categories. If we have too many categories, the category names tends to get crowded in a vertical bar plot, whereas the horizontal display gives us enough space to display the category names properly.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Horizontal bar plot
graph hbar, over(Cheibub6Type)
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_barplot3.svg&#34; /&gt;


&lt;/figure&gt;

&lt;h2 id=&#34;histogram&#34;&gt;Histogram&lt;/h2&gt;

&lt;p&gt;To draw a histogram, we can use &lt;code&gt;histogram&lt;/code&gt; or the abbreviated &lt;code&gt;hist&lt;/code&gt; command.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Percentage of women in lower house, 2015 (IPU)
hist UNDP_Life2014
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_hist1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;The default is density. We can change it to &lt;code&gt;frequency&lt;/code&gt;, &lt;code&gt;fraction&lt;/code&gt;, or &lt;code&gt;percent&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;hist UNDP_Life2014, freq
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Some prefer to draw a frequency histogram with overlaid normal density curve to see if the observed distribution is aprroximatley symmetrical.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;hist UNDP_Life2014, normal
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_hist2.svg&#34; /&gt;


&lt;/figure&gt;

&lt;h2 id=&#34;density-plot&#34;&gt;Density plot&lt;/h2&gt;

&lt;p&gt;Density plot is similar to histogram, but is more &amp;ldquo;smoothed over&amp;rdquo;. To draw this, we use &lt;code&gt;kdensity&lt;/code&gt;, stands for &amp;ldquo;kernel density&amp;rdquo;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;kdensity UNDP_Life2014
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_density1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;Similarly, we can overlay a normal density plot over the kernel density plot.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;kdensity UNDP_Life2014, normal
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_density2.svg&#34; /&gt;


&lt;/figure&gt;

&lt;h2 id=&#34;box-plot&#34;&gt;Box plot&lt;/h2&gt;

&lt;p&gt;See this &lt;a href=&#34;https://fanghuiz.github.io/ps0700/post/2019-03-23-boxplot/&#34; target=&#34;_blank&#34;&gt;post&lt;/a&gt; for more discussions on how to read a box plot, and its drawback.&lt;/p&gt;

&lt;p&gt;We can use &lt;code&gt;graph box&lt;/code&gt; to draw a box plot.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Life expectancy at birth, 2014 (UNDP 2014)
graph box UNDP_Life2014
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_box1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;For a one variable box plot, the default graph does not look very nice. There are various aesthetic changes we can make. For example, we can use &lt;code&gt;outergap()&lt;/code&gt; to increase the gap between the box and the margin (i.e. makes the box narrower), and use &lt;code&gt;intensity()&lt;/code&gt; to change the intensity/transparency of the fill color of the box.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;// Life expectancy at birth, 2014 (UNDP 2014)
graph box UNDP_Life2014, outergap(100) intensity(50)
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_box2.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;We can also rotate the box plot horizontally by telling Stata to draw &lt;code&gt;graph hbox&lt;/code&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;graph hbox UNDP_Life2014, outergap(100) intensity(50)
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_box3.svg&#34; /&gt;


&lt;/figure&gt;

&lt;h2 id=&#34;dot-plot&#34;&gt;Dot plot&lt;/h2&gt;

&lt;p&gt;We can think of a uni-variate dot plot as a one-way scatter plot, where each observation is represented as a dot and plotted individually.&lt;/p&gt;

&lt;pre&gt;&lt;code class=&#34;language-stata&#34;&gt;dotplot UNDP_Life2014
&lt;/code&gt;&lt;/pre&gt;




  

&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/ps0700/img/stata/eda/eda_dot1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;While dot plot is a good way to display all the data (we can see each observation individually), it tends to get cluttered when we have a large sample size.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 8: Sampling and Survey Research</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-03-01-survey/</link>
      <pubDate>Fri, 01 Mar 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-03-01-survey/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#in-the-class-we-talked-about-surveys-having-high-external-validity-but-weak-in-internal-validity-does-external-validity-take-precedence-over-internal-validity-in-terms-of-importance-or-vice-versa&#34;&gt;In the class we talked about surveys having high external validity, but weak in internal validity. Does external validity take precedence (over internal validity) in terms of importance, or vice versa?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#is-random-sampling-and-randomization-the-same-thing&#34;&gt;Is random sampling and randomization the same thing?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#how-can-we-account-for-coverage-error-in-experimental-studies&#34;&gt;How can we account for coverage error in experimental studies?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;in-the-class-we-talked-about-surveys-having-high-external-validity-but-weak-in-internal-validity-does-external-validity-take-precedence-over-internal-validity-in-terms-of-importance-or-vice-versa&#34;&gt;In the class we talked about surveys having high external validity, but weak in internal validity. Does external validity take precedence (over internal validity) in terms of importance, or vice versa?&lt;/h2&gt;

&lt;p&gt;I would say that in general, it is more important to establish internal validity than external validity. If we can ensure internal validity, at the very least, we can claim to have gained some localized knowledge ($X$ causes $Y$ in the sample we have studied), even if this knowledge might not hold in another context.&lt;/p&gt;

&lt;p&gt;However, if we cannot be sure that the findings in our current study is internally valid (i.e. if we are unable to establish a credible claim that it is indeed $X$ that caused a change in $Y$, rather than other confounding factors), then what’s the point of generalizing this invalid claim?  Only when we have confidence in the internal validity of a study (more localized knowledge), then having external validity will be useful (allow us to expand on this knowledge). Otherwise, generalizing a wrong-headed conclusion only compounds the initial error, like adding more heights to a building with a faulty foundation.&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;https://media.giphy.com/media/gXF3P4m5CMPTy/giphy.gif&#34; /&gt;


&lt;/figure&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;is-random-sampling-and-randomization-the-same-thing&#34;&gt;Is random sampling and randomization the same thing?&lt;/h2&gt;

&lt;p&gt;Random sample refers to a sample (i.e. the subset of population that we include in the study) where each unit is chosen randomly. This concerns the cases or subjects in the study.&lt;/p&gt;

&lt;p&gt;Randomization (a.k.a random assignment) refers the process of randomly assigning each unit in our study to receive the treatment or not. This concerns whether the units/subjects (who are already included in the study) is receiving the treatment, or will they in the control group.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;how-can-we-account-for-coverage-error-in-experimental-studies&#34;&gt;How can we account for coverage error in experimental studies?&lt;/h2&gt;

&lt;p&gt;Depending on how the subjects are recruited into the experiment, coverage errors in experimental studies can be difficult to avoid. Recall that many experiments, especially lab experiments, rely on convenience sample, which usually leads to part of the population not being covered in the sampling process. If subjects are recruited among the college undergraduates, then anyone who is not a undergraduate from that university is excluded from the sample.&lt;/p&gt;

&lt;p&gt;This problem can be difficult to “account for” if we are using convenience sample, since it is built-in to the sampling process. However, other types of non-laboratory based experiments (e.g. survey experiments or field experiments) often have better coverage, which mitigates (though does not 100% eliminate) the problems of non-representative sample that comes with coverage errors.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 7: Comparative Studies</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-02-22-comparative/</link>
      <pubDate>Fri, 22 Feb 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-02-22-comparative/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#can-you-explain-more-about-the-connection-between-mill-s-method-of-difference-and-experiments&#34;&gt;Can you explain more about the connection between Mill’s method of difference and experiments?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#is-the-concern-for-external-validity-problems-only-apply-to-method-of-difference-or-method-of-agreement-as-well&#34;&gt;Is the concern for external validity problems only apply to method of difference, or method of agreement as well?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#is-selecting-on-dependent-variable-only-a-problem-for-method-of-agreement&#34;&gt;Is selecting on dependent variable only a problem for method of agreement?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#the-lecture-mentioned-that-method-of-difference-has-trouble-estimating-multiple-causes-what-are-some-examples-of-multiple-causes-cases&#34;&gt;The lecture mentioned that method of difference has trouble estimating “multiple causes”. What are some examples of “multiple causes” cases?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;can-you-explain-more-about-the-connection-between-mill-s-method-of-difference-and-experiments&#34;&gt;Can you explain more about the connection between Mill’s method of difference and experiments?&lt;/h2&gt;

&lt;p&gt;The two have very similar causal logic. They both try to establish a causal claim (difference in $Y$ can be attributed to changes in $X$) by leveraging on the fact that the treatment group (&lt;code&gt;X = 1&lt;/code&gt;) and the control group (&lt;code&gt;X = 0&lt;/code&gt;) are similar on other confounding variables ($Z$), except the treatment variable ($Y$) — since the two groups are similar in other aspects except with regard to $X$, any observed difference in $Y$ must be caused by the difference in $X$.&lt;/p&gt;

&lt;p&gt;Method of difference try to approximate a comparable treatment and control group by selecting cases with similar attributes except $X$ (mostly based on theory and domain knowledge about what factors could be potentially confounding variables). Experiments try to achieve this by randomly assigning the treatment.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;is-the-concern-for-external-validity-problems-only-apply-to-method-of-difference-or-method-of-agreement-as-well&#34;&gt;Is the concern for external validity problems only apply to method of difference, or method of agreement as well?&lt;/h2&gt;

&lt;p&gt;It is a problem in both types of designs. External validity issue is present in all studies where we only have a small number of non-randomly selected cases.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;is-selecting-on-dependent-variable-only-a-problem-for-method-of-agreement&#34;&gt;Is selecting on dependent variable only a problem for method of agreement?&lt;/h2&gt;

&lt;p&gt;Yes it is only a problem for comparative designs that select cases using methods of agreement.&lt;/p&gt;

&lt;p&gt;We say a study is “selecting on dependent variable” when the decision criterion to include certain units into (or exclude from) the study sample is correlated with the value of the dependent variable.&lt;/p&gt;

&lt;p&gt;For method of agreement, we are comparing cases with the same outcome but differs in the value of independent variable. In another word, the reason we are including these cases in the comparison is &lt;strong&gt;because&lt;/strong&gt; that they share the same outcome, and other cases are excluded because they have a different outcome — the decision criterion for sample selection is directly related to the status of dependent variable.&lt;/p&gt;

&lt;p&gt;Designs using methods of difference for case selection are not selecting on dependent variable. In this method, the criterion to select cases to be included in the sample is not related to what the outcomes are. Instead, we are selecting cases based on the independent variables — we are comparing the cases that are similar in all the independent variables, except one crucial explanatory factor that we are interested in.&lt;/p&gt;

&lt;hr /&gt;

&lt;h2 id=&#34;the-lecture-mentioned-that-method-of-difference-has-trouble-estimating-multiple-causes-what-are-some-examples-of-multiple-causes-cases&#34;&gt;The lecture mentioned that method of difference has trouble estimating “multiple causes”. What are some examples of “multiple causes” cases?&lt;/h2&gt;

&lt;p&gt;An event or outcome has multiple causes when there are more than one factors that could have lead to the outcome. For example, why the U.S has low voter turnout? There could be multiple factors for this: no compulsory voting; low interests in politics; election day is not a national holiday; two party system; winner-take-all system etc.&lt;/p&gt;

&lt;p&gt;Most of the phenomenon we are interested are quite complex, so we should expect there to be multiple-causes most of the time.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 6: Formal Models and Game Theory</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-02-15-rational-choice/</link>
      <pubDate>Fri, 15 Feb 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-02-15-rational-choice/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#costs-of-voting&#34;&gt;Costs of Voting&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#can-you-explain-a-bit-more-about-the-table-for-costs-of-voting&#34;&gt;Can you explain a bit more about the table for costs of voting?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#game-theory-and-government-shutdown&#34;&gt;Game Theory and Government Shutdown&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#trump-was-prolonging-the-shutdown-in-order-to-get-funding-for-the-wall-is-that-a-game-theory-strategic-interaction-scenario&#34;&gt;Trump was prolonging the shutdown in order to get funding for the wall, is that a game theory/strategic interaction scenario?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;costs-of-voting&#34;&gt;Costs of Voting&lt;/h2&gt;

&lt;h3 id=&#34;can-you-explain-a-bit-more-about-the-table-for-costs-of-voting&#34;&gt;Can you explain a bit more about the table for costs of voting?&lt;/h3&gt;

&lt;p&gt;This is the table I have in the recitation slides:
&lt;!--
|   | Election outcome = Preferred candidate won    | Election outcome = Preferred candidate Lost   |
|-------------  |--------------------------------------------   |---------------------------------------------  |
| **Voted = Yes**   | Benefits - Costs of Voting    | - Costs of Voting     |
| **Voted = No**    | Benefits  | Zero  | --&gt;&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Voted = Yes&lt;/th&gt;
&lt;th&gt;Voted = No&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Election outcome = &lt;br&gt; Preferred candidate won&lt;/td&gt;
&lt;td&gt;Benefits - Costs of Voting&lt;/td&gt;
&lt;td&gt;Benefits&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;Election outcome = &lt;br&gt; Preferred candidate lost&lt;/td&gt;
&lt;td&gt;- Costs of Voting&lt;/td&gt;
&lt;td&gt;Zero&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;First, we have a few assumptions when analyzing the decision to vote from a rational choice framework:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cost of voting is &lt;em&gt;negative&lt;/em&gt; if we vote; and is &lt;em&gt;zero&lt;/em&gt; if we do not voting&lt;/li&gt;
&lt;li&gt;Benefit is &lt;em&gt;positive&lt;/em&gt; if our preferred candidate wins; and is &lt;em&gt;zero&lt;/em&gt; if our preferred candidate loses&lt;/li&gt;
&lt;li&gt;Chance of any individual vote changing the outcome is very low (close to zero)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The intuition behind the table is that no matter what is the election outcome (&lt;em&gt;preferred candidate win or lose&lt;/em&gt;), for us personally, the net benefit is always &lt;strong&gt;higher&lt;/strong&gt; if we do not vote, than if we vote.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If our preferred candidate wins (&lt;em&gt;first row&lt;/em&gt;), &lt;code&gt;Benefits &amp;gt; Benefits - Costs of Voting&lt;/code&gt;. Net benefit is higher if we do not vote.&lt;/li&gt;
&lt;li&gt;If our preferred candidate loses (&lt;em&gt;second row&lt;/em&gt;), &lt;code&gt;Zero &amp;gt; - Costs of Voting&lt;/code&gt;. Net benefit is higher if we do not vote.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&#34;game-theory-and-government-shutdown&#34;&gt;Game Theory and Government Shutdown&lt;/h2&gt;

&lt;h3 id=&#34;trump-was-prolonging-the-shutdown-in-order-to-get-funding-for-the-wall-is-that-a-game-theory-strategic-interaction-scenario&#34;&gt;Trump was prolonging the shutdown in order to get funding for the wall, is that a game theory/strategic interaction scenario?&lt;/h3&gt;

&lt;p&gt;Yes! Threatening or prolonging government shutdown in order to leverage a “better deal”, when viewed from a strategic interaction lens, is quite similar to the &lt;a href=&#34;https://en.wikipedia.org/wiki/Chicken_(game)&#34; target=&#34;_blank&#34;&gt;game of chicken&lt;/a&gt; (a sort of brinksmanship).&lt;/p&gt;

&lt;p&gt;This is a situation where both players will benefit if both sides yield (&lt;em&gt;take a compromise budget deal&lt;/em&gt;), both players will lose if neither side yield (&lt;em&gt;government shutdown&lt;/em&gt;), but if only one player yields and the other doesn’t (&lt;em&gt;Trump gives up, Congressional Democrats do not&lt;/em&gt;), then the player that yields loses and the other player benefits (&lt;em&gt;no funding for wall, government shutdown ends&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;See this &lt;a href=&#34;https://www.pbs.org/newshour/science/how-the-shutdown-might-end-according-to-game-theory&#34; target=&#34;_blank&#34;&gt;NPR article&lt;/a&gt; for more in depth discussion on the incentives both sides faced that shaped this negotiation into a political brinksmanship, and this &lt;a href=&#34;https://fivethirtyeight.com/features/were-all-to-blame-for-the-shutdown/&#34; target=&#34;_blank&#34;&gt;FiveThirtyEight article&lt;/a&gt; on why we (the voters) are partly to blame for this.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 4: Natural Experiments and Observational Studies</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-02-01-natural-experiment/</link>
      <pubDate>Fri, 01 Feb 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-02-01-natural-experiment/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#natural-experiments&#34;&gt;Natural Experiments&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#in-class-you-mentioned-natural-experiments-based-on-geographical-boundaries-can-be-complicated-by-human-factors-can-you-explain-a-bit-more-what-this-means&#34;&gt;In class you mentioned “Natural experiments based on geographical boundaries can be complicated by human factors”. Can you explain a bit more what this means?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#how-would-we-know-if-the-as-if-randomization-assumption-is-valid&#34;&gt;How would we know if the “as-if randomization” assumption is valid?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#observational-studies&#34;&gt;Observational studies&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#is-there-any-way-to-get-rid-of-confounding-variables-in-observational-studies&#34;&gt;Is there any way to get rid of confounding variables in observational studies?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#how-are-longitudinal-studies-and-cross-sectional-studies-different&#34;&gt;How are longitudinal studies and cross-sectional studies different?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;natural-experiments&#34;&gt;Natural Experiments&lt;/h2&gt;

&lt;h3 id=&#34;in-class-you-mentioned-natural-experiments-based-on-geographical-boundaries-can-be-complicated-by-human-factors-can-you-explain-a-bit-more-what-this-means&#34;&gt;In class you mentioned “Natural experiments based on geographical boundaries can be complicated by human factors”. Can you explain a bit more what this means?&lt;/h3&gt;

&lt;p&gt;Recall that the key assumption in a natural experiment design that ensures internal validity is that the treatment assignment is random or &lt;em&gt;“as-if”&lt;/em&gt; random. In another word, we have to ask, is the treatment assignment correlated with any other factors that could potentially cause the observed difference between treatment and control group? If yes, then the assumption does not hold and the study’s internal validity is weakened. If no, then the assumption of &lt;em&gt;“as-if”&lt;/em&gt; randomization holds.&lt;/p&gt;

&lt;p&gt;In the &lt;a href=&#34;https://www.nytimes.com/2018/08/24/business/money-satisfaction-lottery-study.html&#34; target=&#34;_blank&#34;&gt;study&lt;/a&gt; on whether money from lottery will increase happiness, the assumption is that the treatment (&lt;em&gt;winning money from lottery&lt;/em&gt;) is randomly assigned among lottery buyers, hence whether someone is in the treatment group (&lt;em&gt;lottery winners&lt;/em&gt;) or the control group (&lt;em&gt;lottery losers&lt;/em&gt;) is &lt;strong&gt;not correlated&lt;/strong&gt; with other factors that affects their happiness. In another word, treatment assignment (&lt;em&gt;whether someone gets money&lt;/em&gt;) is &lt;strong&gt;independent&lt;/strong&gt; of other confounding factors that could have affected the outcome (&lt;em&gt;happiness&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;In studies that leverage on geographical boundaries for natural experiment opportunities, the generic set-up is to compare Area A (&lt;em&gt;treatment group&lt;/em&gt;) on one side of the geographical boundary that have received the treatment, with Area B (&lt;em&gt;control group&lt;/em&gt;) on the other side of the boundary that have not received the treatment. This means that we have to ask, is the treatment assignment (&lt;em&gt;being on one side of the boundary vs the other side&lt;/em&gt;) correlated with any other factors that could explain the difference in outcomes between Area A and Area B?&lt;/p&gt;

&lt;p&gt;So what I meant by “natural experiments based on geographical boundaries can complicated by human factors“ was that, sometimes how the geographical boundaries are drawn, &lt;strong&gt;is not independent&lt;/strong&gt; of the characteristics of the humans/political actors that draw these boundaries (i.e. the division introduced by the boundary is not random). If the reasons for how boundaries are drawn correlates with reasons that could explain the outcome, then the &lt;em&gt;“as-if”&lt;/em&gt; randomization assumption would not hold.&lt;/p&gt;

&lt;p&gt;Think about &lt;a href=&#34;http://danielnposner.com/wp-content/uploads/2015/11/Posner-2004b.pdf&#34; target=&#34;_blank&#34;&gt;Posner (2004)&lt;/a&gt; we read for class, where Posner found that the relative size of the two ethnic groups (&lt;em&gt;treatment&lt;/em&gt;) within each country explained why the cultural differences between the Chewa and Tumbuka ethnic groups are politically salient in Malawi but not in Zambia (&lt;em&gt;outcome&lt;/em&gt;). He argued that the treatment assignment (&lt;em&gt;being in a country where the two ethnic groups is relatively large vs relatively small&lt;/em&gt;) is “as-if“ random (&lt;em&gt;assignment is uncorrelated with other factors that could explain the outcome&lt;/em&gt;), because “like many African borders, the one that separates Zambia and Malawi was drawn purely for [colonial] administrative purposes, with no attention to the distribution of groups on the ground” (Posner 2004: 530).&lt;/p&gt;

&lt;p&gt;If however, the boundary that separates Zambia and Malawi are drawn for reasons that potentially correlate with factors affecting inter-group interaction (say for example, natural resource availability), then the treatment assignment is no long &lt;em&gt;“as-if”&lt;/em&gt; random.&lt;/p&gt;

&lt;h3 id=&#34;how-would-we-know-if-the-as-if-randomization-assumption-is-valid&#34;&gt;How would we know if the “as-if randomization” assumption is valid?&lt;/h3&gt;

&lt;p&gt;Since we have no control over the treatment assignment process in natural experiments, we cannot really &lt;em&gt;“prove”&lt;/em&gt; whether this &lt;em&gt;“as-if”&lt;/em&gt; randomization assumption is valid. All we can do is provide evidence to show that this assumption is plausible.&lt;/p&gt;

&lt;p&gt;For example, we can rely on theory and background knowledge to make the case: assignment through lottery is plausibly random because we know how the winner are chose.&lt;/p&gt;

&lt;p&gt;And for the Posner (2004) study, if there were some qualitative evidence (e.g. written records of how boundaries were decided) showing that the boundary was indeed “drawn purely for [colonial] administrative purposes, with no attention to the distribution of groups on the ground”, then that would be an important piece of evidence to support the “as-if” randomization claim.&lt;/p&gt;

&lt;p&gt;We can also provide empirical evidence. Recall that randomly assignment treatment will give us comparable treatment and control groups, i.e. the groups on average, would be similar to each other in terms of any potential confounding variables. So we should expect that “as-if” randomization process should give us such comparable groups as well.&lt;/p&gt;

&lt;p&gt;Researchers can measure the potential confounding variables and empirically test if the treatment and control groups are similar in those aspects. If we do not find any significant difference between the two groups in terms of those potential confounders, then that would be a piece of evidence supporting the “as-if” randomization assumption.&lt;/p&gt;

&lt;h2 id=&#34;observational-studies&#34;&gt;Observational studies&lt;/h2&gt;

&lt;h3 id=&#34;is-there-any-way-to-get-rid-of-confounding-variables-in-observational-studies&#34;&gt;Is there any way to get rid of confounding variables in observational studies?&lt;/h3&gt;

&lt;p&gt;Confounding &lt;em&gt;variables&lt;/em&gt; will always be present (we cannot &amp;ldquo;get rid of them&amp;rdquo; per se), but we can reduce the &lt;em&gt;bias&lt;/em&gt; to our inference/conclusion introduced by any confounding variables.&lt;/p&gt;

&lt;p&gt;Whenever we want to investigate if $X \rightarrow Y$, there will be confounding variables $Z$ lurking behind the scenes, that’s just the feature of the world we live in. These confounding variables will introduce &lt;strong&gt;bias&lt;/strong&gt; to our inference, if we mistakenly conclude that the change in $Y$ is caused by $X$, while in fact the change in $Y$ was caused by $X$ and $Z$ (or $Z$ alone). This bias is often known by the jargon &lt;a href=&#34;https://en.wikipedia.org/wiki/Omitted-variable_bias&#34; target=&#34;_blank&#34;&gt;omitted variable bias&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;When designing a study to investigate if $X \rightarrow Y$, one of our goals is to reduce any potential bias introduced by confounding variables, in order to isolate the effects of $X$ on $Y$ (&lt;em&gt;how much of the change in $Y$ can be attributed to $X$, instead of $Z$&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;Two common ways to reduce this bias in observational studies:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Statistically adjusting/controlling for observable confounding variables (i.e. include the “omitted” confounding variables in the statistical model, at least for those we have the data for).&lt;/li&gt;
&lt;li&gt;If our data has multiple time points (i.e. panel data or time series data), statistically adjusting/controlling for observable and unobservable confounding variables by leveraging on the temporal nature of the data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The jargon for these different techniques to isolate the effects of $X$ on $Y$ is “identification strategy” — strategies that help us to &lt;em&gt;identify&lt;/em&gt; the effects of $X$ on $Y$. Randomized experiment, natural experiments, statistically adjusting for confounders are different types of identification strategies we can use.&lt;/p&gt;

&lt;h3 id=&#34;how-are-longitudinal-studies-and-cross-sectional-studies-different&#34;&gt;How are longitudinal studies and cross-sectional studies different?&lt;/h3&gt;

&lt;p&gt;We have a longitudinal study if we have data for each unit at multiple time points, i.e. every unit is measured more than once. For example, a study on the effects of emergency events boosting presidential approval ratings (i.e. rally-the-flag effects) would be a longitudinal study (or more specifically, time series) — the unit of analysis is presidential approval ratings, and we have measures for this unit at multiple time points, before and after the emergency events.&lt;/p&gt;

&lt;p&gt;A cross-sectional study is one where we only have data for each unit at one time points. If we were to examine whether partisanship affects how individuals evaluate the president’s response to a emergency event, say a devastating hurricane, using a survey conducted after the hurricane, then that would be a cross-sectional study — the unit of analysis is individual survey respondents, and we only have measures for the same person at one point in time (the time they responded to the survey).&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 3: Experiments and Ethics</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-01-25-experiment/</link>
      <pubDate>Fri, 25 Jan 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-01-25-experiment/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#about-informed-consent&#34;&gt;About Informed Consent&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#is-informed-consent-always-necessary-when-considering-the-ethics-of-social-science-experiments-for-some-experiments-obtaining-informed-consent-could-affect-the-results-if-people-are-aware-of-what-the-researchers-are-trying-to-do-exactly-how-much-are-the-researchers-required-to-inform-the-participants-about-the-experiment&#34;&gt;Is informed consent always necessary when considering the ethics of social science experiments? For some experiments, obtaining informed consent could affect the results (if people are aware of what the researchers are trying to do). Exactly how much are the researchers required to inform the participants about the experiment?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#does-knowing-you-are-part-of-an-experiment-affect-how-they-respond-is-there-a-way-to-minimize-the-effects-of-this-on-the-outcome&#34;&gt;Does knowing you are part of an experiment affect how they respond? Is there a way to minimize the effects of this on the outcome?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#about-the-montana-gotv-experiment&#34;&gt;About the Montana GOTV Experiment&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#the-montana-experiment-https-thewpsa-wordpress-com-2014-10-25-messing-with-montana-get-out-the-vote-experiment-raises-ethics-questions-misled-the-people-by-using-official-seal-how-did-they-get-the-project-approved-in-the-first-place&#34;&gt;The &lt;a href=&#34;https://thewpsa.wordpress.com/2014/10/25/messing-with-montana-get-out-the-vote-experiment-raises-ethics-questions/&#34; target=&#34;_blank&#34;&gt;Montana experiment&lt;/a&gt; misled the people by using official seal. How did they get the project approved in the first place?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#how-did-the-montana-experiment-affected-people-s-decision-i-don-t-see-a-discussion-on-how-it-actually-influenced-the-turnout-or-election-outcome&#34;&gt;How did the Montana experiment affected people’s decision? I don’t see a discussion on how it actually influenced the turnout or election outcome.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#about-experiments-on-development-programs&#34;&gt;About Experiments on Development Programs&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#are-there-examples-of-ethical-and-effective-anti-poverty-experiments&#34;&gt;Are there examples of ethical and effective anti-poverty experiments?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#not-a-question-just-an-interesting-observation-in-the-us-during-the-1960s-70s-there-was-a-similar-program-to-universal-basic-income-it-was-ended-after-there-was-an-increase-in-divorce-rate&#34;&gt;Not a question, just an interesting observation &amp;ndash; in the US during the 1960s-70s, there was a similar program to Universal Basic Income. It was ended after there was an increase in divorce rate.&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;about-informed-consent&#34;&gt;About Informed Consent&lt;/h2&gt;

&lt;h4 id=&#34;is-informed-consent-always-necessary-when-considering-the-ethics-of-social-science-experiments-for-some-experiments-obtaining-informed-consent-could-affect-the-results-if-people-are-aware-of-what-the-researchers-are-trying-to-do-exactly-how-much-are-the-researchers-required-to-inform-the-participants-about-the-experiment&#34;&gt;Is informed consent always necessary when considering the ethics of social science experiments? For some experiments, obtaining informed consent could affect the results (if people are aware of what the researchers are trying to do). Exactly how much are the researchers required to inform the participants about the experiment?&lt;/h4&gt;

&lt;p&gt;Yes, informed consent is an essential element of research ethics.&lt;/p&gt;

&lt;p&gt;Generally speaking, we have to inform the participants the purpose of our research (e.g. &lt;em&gt;“This is a study about attitudes towards political candidates&lt;/em&gt;), though we do not have to tell them the exact hypothesis of the study.&lt;/p&gt;

&lt;p&gt;It is also important that the informed consent form has to let the participants know if there is any potential benefits or harms by taking part in the study, any compensations or incentives, confidentiality or privacy of the data, their rights to decline and to withdraw, so they can make an informed decision about participating. In most cases, political science experiments only involve &amp;ldquo;minimal risks&amp;rdquo;, i.e. about the same probability and magnitude of harm we would experience in daily life.&lt;/p&gt;

&lt;p&gt;For more details on the important elements to include when obtaining informed consent, see &lt;a href=&#34;https://www.irb.pitt.edu/content/chapter-13-informed-consent-and-documentation&#34; target=&#34;_blank&#34;&gt;this guide&lt;/a&gt; from Pitt IRB, or American Psychological Association (APA) &lt;a href=&#34;https://www.apa.org/ethics/code/&#34; target=&#34;_blank&#34;&gt;ethics code&lt;/a&gt; (Section 8.02). It is possible to request waivers with adequate justification (see &lt;a href=&#34;http://www.irb.pitt.edu/sites/default/files/waivers%20presentation%204.10.18.pdf&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; for an overview of the requirements).&lt;/p&gt;

&lt;h4 id=&#34;does-knowing-you-are-part-of-an-experiment-affect-how-they-respond-is-there-a-way-to-minimize-the-effects-of-this-on-the-outcome&#34;&gt;Does knowing you are part of an experiment affect how they respond? Is there a way to minimize the effects of this on the outcome?&lt;/h4&gt;

&lt;p&gt;Quite likely! One possibility is &lt;a href=&#34;https://en.wikipedia.org/wiki/Hawthorne_effect&#34; target=&#34;_blank&#34;&gt;Hawthorne effect&lt;/a&gt;: simply being part of the experiment and knowing that you are being observed might change your behavior or how you respond, compare to everyday life scenario.&lt;/p&gt;

&lt;p&gt;A more general phenomenon (some argue subsumes the Hawthorne effect) is called &lt;a href=&#34;https://en.wikipedia.org/wiki/Demand_characteristics&#34; target=&#34;_blank&#34;&gt;demand characteristics&lt;/a&gt; (also see textbook p.178), referring to how participants&amp;rsquo; interpretation of the experiment&amp;rsquo;s purpose could potentially change their behaviors (e.g. behave in ways &lt;em&gt;conforming&lt;/em&gt; to what they &lt;em&gt;think&lt;/em&gt; the researchers want to observe, or they might behave in ways &lt;em&gt;contradicting&lt;/em&gt; to what they perceived as the researchers&amp;rsquo; hypothesis).&lt;/p&gt;

&lt;p&gt;It is worth noting however, that not all experiments are equally affected by this potential problem. We might expect that experiments looking at behaviors that are more susceptible to social desirability bias are more vulnerable to bias introduced by demand characteristics, than those looking at more benign phenomenons.&lt;/p&gt;

&lt;p&gt;While it is difficult to eliminate this effect completely in most experiments, some strategies exist. For example, researchers can devise a design that uses covert or unobtrusive treatments, so the participants are unaware that they are part of an experiment (e.g. &lt;a href=&#34;https://scholar.harvard.edu/files/renos/files/enostrains.pdf&#34; target=&#34;_blank&#34;&gt;Enos 2014&lt;/a&gt;, &lt;a href=&#34;https://www.pnas.org/content/114/4/663.full&#34; target=&#34;_blank&#34;&gt;Sands 2017&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Deception is another common, though &lt;a href=&#34;https://www.tandfonline.com/doi/full/10.1080/10508420701712990&#34; target=&#34;_blank&#34;&gt;deeply&lt;/a&gt; &lt;a href=&#34;https://thepsychologist.bps.org.uk/volume-24/edition-8/deception-psychological-research-necessary-evil&#34; target=&#34;_blank&#34;&gt;controversial&lt;/a&gt; strategy. For example &lt;a href=&#34;https://en.wikipedia.org/wiki/Audit_study&#34; target=&#34;_blank&#34;&gt;audit experiments&lt;/a&gt; often rely on deception to examine socially undesirable behaviors such as discrimination (e.g. &lt;a href=&#34;https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1540-5907.2011.00515.x&#34; target=&#34;_blank&#34;&gt;Butler and Broockman 2011&lt;/a&gt;), norms or rules violation (e.g. &lt;a href=&#34;http://www.michael-findley.com/uploads/2/0/4/5/20455799/findley_et_al.causes_of_non-compliance.ajps.10sep14.earlyview.pdf&#34; target=&#34;_blank&#34;&gt;Findley, Neilson and Sharman 2014&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Of course, the use of deception always has to be justified in the ethics review process. See this &lt;a href=&#34;http://scholar.harvard.edu/files/dtingley/files/spring2012.pdf&#34; target=&#34;_blank&#34;&gt;newsletter&lt;/a&gt; (p.13-19) for further discussion on the ethics of using deception in field experiment involving public officials as subjects.&lt;/p&gt;

&lt;h2 id=&#34;about-the-montana-gotv-experiment&#34;&gt;About the Montana GOTV Experiment&lt;/h2&gt;

&lt;h4 id=&#34;the-montana-experiment-https-thewpsa-wordpress-com-2014-10-25-messing-with-montana-get-out-the-vote-experiment-raises-ethics-questions-misled-the-people-by-using-official-seal-how-did-they-get-the-project-approved-in-the-first-place&#34;&gt;The &lt;a href=&#34;https://thewpsa.wordpress.com/2014/10/25/messing-with-montana-get-out-the-vote-experiment-raises-ethics-questions/&#34; target=&#34;_blank&#34;&gt;Montana experiment&lt;/a&gt; misled the people by using official seal. How did they get the project approved in the first place?&lt;/h4&gt;

&lt;p&gt;Only the people involved in the process would ever know! If I were to hazard a guess (take it with many many grains of salt), it is possible that the review process did not see the mailer as being intentionally misleading. Among the commotion in the follow-up to this controversy, one detail about the mailer did not get much attention — there was in fact a disclaimer line disclosing that the mailer is part of a academic study (below the boxes indicating candidate ideology).&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;montana_mailer.png&#34; alt=&#34;Mailer from the experiment. Squint a little to see the disclaimer. Retrieved from&#34; /&gt;



&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  
  &lt;p&gt;
    Mailer from the experiment. Squint a little to see the disclaimer. Retrieved from
    &lt;a href=&#34;https://web.archive.org/web/20170701171915/http://politicalpractices.mt.gov/content/2recentdecisions/McCullochvStanfordandDartmouthComplaint&#34;&gt; 
    Internet Archive
    &lt;/a&gt; 
  &lt;/p&gt; 
&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;Maybe it’s too much of a fine print, but it’s there. So you could make the argument that they are not actively trying to deceive the recipient about who is sending the mailer out, and this might be part of reason why the proposal was approved. Again, I have to emphasize that this is all speculations on my part.&lt;/p&gt;

&lt;h4 id=&#34;how-did-the-montana-experiment-affected-people-s-decision-i-don-t-see-a-discussion-on-how-it-actually-influenced-the-turnout-or-election-outcome&#34;&gt;How did the Montana experiment affected people’s decision? I don’t see a discussion on how it actually influenced the turnout or election outcome.&lt;/h4&gt;

&lt;p&gt;We might never know! After the whole debacle, the study is unpublishable. Partly due to the ethical issue, partly because the data is likely unusable, given the spillover/contamination effect caused by the news coverage. After the news outlets reported about the story, those in the treatment group who have received the mailer would have known about where this mailer comes from and why they are receiving it (treatment is contaminated by extraneous factors that the researchers did not intend to provide), and those in the control group would also have known about the information in the mailer despite not receiving one (treatment spillover).&lt;/p&gt;

&lt;h2 id=&#34;about-experiments-on-development-programs&#34;&gt;About Experiments on Development Programs&lt;/h2&gt;

&lt;h4 id=&#34;are-there-examples-of-ethical-and-effective-anti-poverty-experiments&#34;&gt;Are there examples of ethical and effective anti-poverty experiments?&lt;/h4&gt;

&lt;p&gt;There are many examples of using randomized experiments to evaluate the impacts of anti-poverty programs. Some good places to look for them: &lt;a href=&#34;https://www.povertyactionlab.org/&#34; target=&#34;_blank&#34;&gt;Poverty Action Lab (J-PAL)&lt;/a&gt; (research center at MIT), &lt;a href=&#34;https://www.givewell.org/&#34; target=&#34;_blank&#34;&gt;GiveWell&lt;/a&gt; (nonprofit focused on effective charities).&lt;/p&gt;

&lt;h4 id=&#34;not-a-question-just-an-interesting-observation-in-the-us-during-the-1960s-70s-there-was-a-similar-program-to-universal-basic-income-it-was-ended-after-there-was-an-increase-in-divorce-rate&#34;&gt;Not a question, just an interesting observation &amp;ndash; in the US during the 1960s-70s, there was a similar program to Universal Basic Income. It was ended after there was an increase in divorce rate.&lt;/h4&gt;

&lt;p&gt;Hmm, this is really interesting to know. So if the experiment shows that UBI improves some aspects of life quality (e.g. household income, children&amp;rsquo;s education), but also has other &amp;ldquo;side-effects&amp;rdquo; such as increases divorce rate, from the policy-makers&amp;rsquo; position, what should they make of this? What kind of &amp;ldquo;side-effects&amp;rdquo;, or how much, would be considered as a &amp;ldquo;reasonable&amp;rdquo; level of trade-off? Back to what we discussed in the beginning of the course, empirical evidence does not always lead to a neat solution to normative questions.&lt;/p&gt;

&lt;!-- ## About Ethics and Experiments in General

#### If the studies were called out for being unethical, do researchers try fix the ethical concerns then re-do the experiment?


#### Facebook is a private corporation, people choose to use its service and agreed to the terms of services. Why do they get so much bad press for it? --&gt;
</description>
    </item>
    
    <item>
      <title>Example: Testing for measurement validity and reliability</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-01-19-example-measurement-stata/</link>
      <pubDate>Sat, 19 Jan 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-01-19-example-measurement-stata/</guid>
      <description>

&lt;h2 id=&#34;example-racial-resentment-scale&#34;&gt;Example: Racial Resentment Scale&lt;/h2&gt;

&lt;p&gt;Racial resentment scale is commonly used to measure &lt;a href=&#34;https://en.wikipedia.org/wiki/Symbolic_racism&#34; target=&#34;_blank&#34;&gt;symbolic racism&lt;/a&gt;. The scale contains four items, for each question, respondents indicate whether they agree or disagree with the statement on a five-point scale. The question wording and the respective variable number as appeared in American National Election Studies (ANES) 2016 are given below:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;code&gt;V162211&lt;/code&gt;: &amp;lsquo;Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.&amp;rsquo;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;V162212&lt;/code&gt;: &amp;lsquo;Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.&amp;rsquo;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;V162213&lt;/code&gt;: &amp;lsquo;Over the past few years, blacks have gotten less than they deserve.&amp;rsquo;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;V162214&lt;/code&gt;: &amp;lsquo;It’s really a matter of some people not trying hard enough, if blacks would only try harder they could be just as well off as whites.&amp;rsquo;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The assumption is that agreeing with statement 1 and 4 (or disagreeing with statement 2 and 3) are indications of resentment towards African Americans.&lt;/p&gt;

&lt;h2 id=&#34;validity-test&#34;&gt;Validity Test&lt;/h2&gt;

&lt;h3 id=&#34;construct-validity&#34;&gt;Construct validity&lt;/h3&gt;

&lt;p&gt;To test for construct validity, we need to demonstrate that the indicator predicts what it is supposed to predict.&lt;/p&gt;

&lt;p&gt;One aspect of construct validity is &lt;strong&gt;convergent validity&lt;/strong&gt;: if theoretically we expect X and Y to be positively related, do we see a positive correlation between the indicator for X and Y?&lt;/p&gt;

&lt;p&gt;In this case, theoretically we might expect that feelings of resentment towards African Americans would correlate with negative affective attitudes towards the group.&lt;/p&gt;

&lt;p&gt;For illustration purposes, let&amp;rsquo;s just use a single statement from the resentment scale, statement 2 (&lt;code&gt;V162212&lt;/code&gt;) about the effects of slavery and see if people&amp;rsquo;s answer to this questions correlates with their feeling thermometer score towards Blacks (&lt;code&gt;V162312&lt;/code&gt;).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;. twoway (scatter V162312 V162212) (lfit V162312 V162212)

&lt;/code&gt;&lt;/pre&gt;




&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/img/stata/validity_1.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;We see that higher disagreement with the statement (&lt;code&gt;1 = Strongly Agree&lt;/code&gt;, &lt;code&gt;5 = Strongly Disagree&lt;/code&gt;) correlates with lower scores on the feeling thermometer (higher value means warmer feeling towards the group, lower value means colder feeling). Resentment towards African Americans (as indicated by denying the effects of slavery on their current day hardship) indeed predicts a more negative attitudes towards them (as indicated by expressing less warm feelings).&lt;/p&gt;

&lt;p&gt;Another way to demonstrate construct validity is to show &lt;strong&gt;divergent/discriminant validity&lt;/strong&gt;: if theoretically we &lt;em&gt;do not&lt;/em&gt; expect X and Y to be related, do we then see a low or weak correlation between them?&lt;/p&gt;

&lt;p&gt;For instance, perhaps we do not expect feelings of racial resentment to be correlated with feelings towards the Supreme Court (&lt;code&gt;V162102&lt;/code&gt;).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;. graph twoway (scatter V162102 V162212) (lfit V162102 V162212)

&lt;/code&gt;&lt;/pre&gt;




&lt;figure&gt;

&lt;img src=&#34;https://fanghuiz.github.io/ps0700/img/stata/validity_2.svg&#34; /&gt;


&lt;/figure&gt;

&lt;p&gt;We see that there is no discernible correlation between responses to statement 2 and feelings towards the Supreme Court.&lt;/p&gt;

&lt;h2 id=&#34;reliability-test&#34;&gt;Reliability Test&lt;/h2&gt;

&lt;p&gt;One common way to quantify the reliability of a multiple indicator scale is to calculate the &lt;a href=&#34;https://en.wikipedia.org/wiki/Cronbach%27s_alpha&#34; target=&#34;_blank&#34;&gt;Cronbach&amp;rsquo;s alpha $\alpha$&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This can be done in Stata using a simple command &lt;code&gt;alpha&lt;/code&gt;, followed by the list of variables used in the scale.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;. alpha V162211 V162212 V162213 V162214

Test scale = mean(unstandardized items)
Reversed items:  V162211 V162214

Average interitem covariance:     1.090995
Number of items in the scale:            4
Scale reliability coefficient:      0.8451

&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the output, we can see a &amp;ldquo;Scale reliability coefficient&amp;rdquo;, which is ~0.8 in this case. A general rule of thumb is that &amp;gt; 0.8 indicates rather high reliability, and anything below 0.7 is a sign of unreliable scale. So in this case, the four-item Racial Resentment Scale has rather high reliability.&lt;/p&gt;
</description>
    </item>
    
    <item>
      <title>Q&amp;A Week 2: Measurement</title>
      <link>https://fanghuiz.github.io/ps0700/post/2019-01-18-measurement/</link>
      <pubDate>Fri, 18 Jan 2019 00:00:00 -0500</pubDate>
      
      <guid>https://fanghuiz.github.io/ps0700/post/2019-01-18-measurement/</guid>
      <description>

&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;nav id=&#34;TableOfContents&#34;&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#types-of-measurement-errors&#34;&gt;Types of measurement Errors&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#how-to-differentiate-between-systematic-vs-random-error-do-you-have-any-examples-of-systematic-errors-in-measurement&#34;&gt;How to differentiate between systematic vs random error? Do you have any examples of systematic errors in measurement?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#which-error-systematic-vs-random-is-worse-which-one-should-we-try-to-avoid-more&#34;&gt;Which error (systematic vs random) is worse? Which one should we try to avoid more?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#about-the-true-score-theory-t-x-epsilon-how-do-we-know-how-close-our-measured-value-x-is-close-to-the-true-score-t-if-we-cannot-truly-know-t&#34;&gt;About the True Score Theory $T = X + \epsilon$, how do we know how close our measured value $X$ is close to the true score $T$, if we cannot truly know $T$?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#if-all-indicators-are-measured-with-some-degrees-of-random-errors-can-too-many-indicators-introduce-more-random-errors&#34;&gt;If all indicators are measured with some degrees of random errors, can too many indicators introduce more random errors?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#measurement-reliability-and-validity&#34;&gt;Measurement Reliability and Validity&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#is-there-a-good-analogy-to-help-remembering-the-difference-between-validity-and-reliability&#34;&gt;Is there a good analogy to help remembering the difference between validity and reliability?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#what-are-some-examples-of-face-validity&#34;&gt;What are some examples of face validity?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#when-do-we-test-for-construct-vs-face-validity&#34;&gt;When do we test for construct vs face validity?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#about-the-article-https-www-nytimes-com-2014-08-28-opinion-nicholas-kristof-is-everyone-a-little-bit-racist-html-we-read-on-using-iat-video-games-to-measure-implicit-racial-bias-how-is-the-reliability-of-the-measure-determined-if-the-same-respondent-takes-the-test-twice-and-gets-different-scores-but-in-the-same-direction-e-g-at-first-longer-then-shorter-time-is-the-measure-considered-reliable&#34;&gt;About the &lt;a href=&#34;https://www.nytimes.com/2014/08/28/opinion/nicholas-kristof-is-everyone-a-little-bit-racist.html&#34; target=&#34;_blank&#34;&gt;article&lt;/a&gt; we read on using IAT/video games to measure implicit racial bias, how is the reliability of the measure determined? If the same respondent takes the test twice and gets different scores but in the same direction (e.g at first longer, then shorter time), is the measure considered reliable?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#levels-of-measurement&#34;&gt;Levels of Measurement&lt;/a&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&#34;#can-you-elaborate-more-on-meaningful-vs-relative-arbitrary-zero-point-and-how-that-relates-to-interval-and-ratio-measures&#34;&gt;Can you elaborate more on meaningful vs relative/arbitrary zero point, and how that relates to interval and ratio measures?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&#34;#can-a-measure-be-both-interval-and-ratio&#34;&gt;Can a measure be both interval and ratio?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/nav&gt;


&lt;h2 id=&#34;types-of-measurement-errors&#34;&gt;Types of measurement Errors&lt;/h2&gt;

&lt;h4 id=&#34;how-to-differentiate-between-systematic-vs-random-error-do-you-have-any-examples-of-systematic-errors-in-measurement&#34;&gt;How to differentiate between systematic vs random error? Do you have any examples of systematic errors in measurement?&lt;/h4&gt;

&lt;p&gt;Systematic errors affects all units in the sample in the same direction (all measured values are consistently more positive or more negative than true value). For example, self-reported measure of turnout is likely to have positive systematic error &amp;ndash; (most) people tend to over-report, saying that they have voted even though they have not.&lt;/p&gt;

&lt;p&gt;Random errors do not affect all units in the sample in a consistent way &amp;ndash; some units will be more positive than true value, some units will be more negative than true value. Let&amp;rsquo;s use the self-reported turnout as an example again. Perhaps people&amp;rsquo;s transient feelings about the current election affect whether they are likely to say they have voted or not &amp;ndash; those who happened to read a positive news story about the election are more likely to over-report having voted, and others who happened to read a negative news story are more likely to under-report.&lt;/p&gt;

&lt;p&gt;Very often both types of errors could be present, so we need to think carefully about the sources of potential errors. For example, crime statistics can be very noisy, with a lot of random errors introduced at various stages of collecting such data. Furthermore, statistics on certain &lt;em&gt;types&lt;/em&gt; of crimes might additionally have systematic errors: for example, domestic abuse might be systematically biased downwards if victims under-report due to fear of retaliation.&lt;/p&gt;

&lt;!-- As the sample size increases, we are likely to see that there are as many positive errors as negative ones, and all the random errors would sum to 0. This means that random errors add *variability/noise* to the data, but will not affect the average value at the *group* level. --&gt;

&lt;h4 id=&#34;which-error-systematic-vs-random-is-worse-which-one-should-we-try-to-avoid-more&#34;&gt;Which error (systematic vs random) is worse? Which one should we try to avoid more?&lt;/h4&gt;

&lt;p&gt;Both types of errors are bad news! But they affect our analysis in different ways.&lt;/p&gt;

&lt;p&gt;High random errors will add more &lt;em&gt;noise/variability&lt;/em&gt; to our data, which will make it harder to detect the presence of a significant correlation between X and Y. In another word, noisy measures are bad because it increases the likelihood of &lt;em&gt;false negative&lt;/em&gt; &amp;ndash; we are likely to mistakenly infer there is no relationship between X and Y, when in fact there is.&lt;/p&gt;

&lt;p&gt;For systematic errors, recall that indicators with high systematic errors are &lt;em&gt;invalid&lt;/em&gt;, i.e. they are not capturing the concept of interest accurately. In such case, an invalid indicator will &lt;em&gt;never&lt;/em&gt; lead us to the right conclusion (think of a road sign that points to the wrong direction), even if the indicator is measured with zero random error.&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;https://vignette.wikia.nocookie.net/thebige/images/d/da/Funny_road_signs_5.jpg/revision/latest/scale-to-width-down/180?cb=20130807005626&#34; alt=&#34;One of them has got to be invalid..&#34; /&gt;



&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  
  &lt;p&gt;
    One of them has got to be invalid..
    
    
    
  &lt;/p&gt; 
&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;In terms of which type of error is &lt;em&gt;worse&lt;/em&gt;, one way I think about this is that invalid indicator is more like a fatal disease, and unreliable indicator is more like a non-fatal but chronic disease that requires lots of care. So if a study is using invalid indicators, we cannot draw any meaningful inferences about the phenomenon we are investigating (the study is &amp;ldquo;dead&amp;rdquo;), while unreliable indicators make it harder to detect a &lt;em&gt;true positive&lt;/em&gt; (increases uncertainty, but does not spell doom).&lt;/p&gt;

&lt;p&gt;The textbook also has a good discussion on the different problems associated with measurement reliability and validity in political science (p.143-145).&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Systematic error = High&lt;/th&gt;
&lt;th&gt;Systematic error = Low&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;

&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Random error = High&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Very very bad &lt;br&gt;&amp;bull; Invalid and unreliable measure &lt;br&gt;&amp;bull; Lots of noise, and signal is pointing at the wrong direction&lt;/td&gt;
&lt;td&gt;Problematic, but can live with &lt;br&gt;&amp;bull; Valid, but unreliable measure &lt;br&gt;&amp;bull; Lots of noise, harder to detect the signal; More likely to get false negative&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Random error = Low&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Problematic &lt;br&gt;&amp;bull; Invalid, but reliable measure &lt;br&gt;&amp;bull; Measure does not capture the concept of interests; Conclusion does not bear on the actual phenomenon of interest&lt;/td&gt;
&lt;td&gt;Awesome! &lt;br&gt;&amp;bull; Valid and reliable measure &lt;br&gt;&amp;bull; Move along&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;Ideally, we should try to minimize both types of measurement errors. Degree of random error can be empirically assessed (e.g. using Cronbach&amp;rsquo;s alpha, see example &lt;a href=&#34;https://fanghuiz.github.io/ps0700/post/2019-01-19-example-measurement-stata/&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;), and can be reduced (e.g. using multiple indicators). Systematic error however, is harder to detect, harder to quantify, and harder to correct for.&lt;/p&gt;

&lt;h4 id=&#34;about-the-true-score-theory-t-x-epsilon-how-do-we-know-how-close-our-measured-value-x-is-close-to-the-true-score-t-if-we-cannot-truly-know-t&#34;&gt;About the True Score Theory $T = X + \epsilon$, how do we know how close our measured value $X$ is close to the true score $T$, if we cannot truly know $T$?&lt;/h4&gt;

&lt;p&gt;Unfortunately, we can never be 100% sure what the value of $T$ is. As mentioned above, while we can detect and correct for random errors, systematic errors cannot be corrected using statistical procedures. After we’ve done our best to minimize random error, it is up to the strength of our theory, clarity of conceptualization, and a small leap of faith to convince others (and ourselves), that our measures are indeed valid ones. This is also part of the reason why social science research can only establish a probabilistic relationship (confident within a certain range) and never a deterministic relationship. Embrace the uncertainty!&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;https://ih1.redbubble.net/image.184209277.1539/flat,1000x1000,075,f.jpg&#34; width=&#34;400&#34; /&gt;


&lt;/figure&gt;

&lt;h4 id=&#34;if-all-indicators-are-measured-with-some-degrees-of-random-errors-can-too-many-indicators-introduce-more-random-errors&#34;&gt;If all indicators are measured with some degrees of random errors, can too many indicators introduce more random errors?&lt;/h4&gt;

&lt;p&gt;Although every single indicator would be measured with some random errors, if we combine the multiple indicators as an index, or take the average value, we should have lower random errors compared to using a single indicator.&lt;/p&gt;

&lt;h2 id=&#34;measurement-reliability-and-validity&#34;&gt;Measurement Reliability and Validity&lt;/h2&gt;

&lt;h4 id=&#34;is-there-a-good-analogy-to-help-remembering-the-difference-between-validity-and-reliability&#34;&gt;Is there a good analogy to help remembering the difference between validity and reliability?&lt;/h4&gt;

&lt;p&gt;In class, I&amp;rsquo;ve made the analogy comparing a valid indicator as a correct label (indicator) matching the content of a box (concept) that you wanted to buy but cannot see what is inside.&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;label_box.png&#34; alt=&#34;Houston, we have a invalid indicator problem.&#34; /&gt;



&lt;figcaption data-pre=&#34;Figure &#34; data-post=&#34;:&#34; &gt;
  
  &lt;p&gt;
    Houston, we have a invalid indicator problem.
    
    
    
  &lt;/p&gt; 
&lt;/figcaption&gt;

&lt;/figure&gt;

&lt;p&gt;I don&amp;rsquo;t really have a good one for reliability, so let&amp;rsquo;s stretch the same label-on-a-box analogy a bit further. Suppose we have a machine printing the label for the box, although the label correctly matches the box content (valid indicator), the machine sometimes misprints a letter or two, so not all labels look the same (unreliable). And if we have Machine A that produces 5% misprinted labels, and Machine B that produces 15% misprinted labels, then we can say that B is &lt;em&gt;less reliable&lt;/em&gt; (produces less consistent outcomes).&lt;/p&gt;

&lt;h4 id=&#34;what-are-some-examples-of-face-validity&#34;&gt;What are some examples of face validity?&lt;/h4&gt;

&lt;p&gt;Whenever you see a indicator used to measure a particular concept, simply ask yourself: does the measure appear to capture the concept you care about? If yes, then the measure has high face validity; if not, then it has low face validity.&lt;/p&gt;

&lt;p&gt;Say I want to measure whether a country&amp;rsquo;s level of human rights protection, which of the following indicators has a higher face validity?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gini coefficient&lt;/li&gt;
&lt;li&gt;Number of political imprisonment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You probably have an answer in your mind. Let&amp;rsquo;s try another one: now I want to measure a country&amp;rsquo;s income inequality, which indicator has a higher face validity?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gini coefficient&lt;/li&gt;
&lt;li&gt;Number of political imprisonment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Again, you have an answer, and you are probably right.&lt;/p&gt;

&lt;p&gt;A few things I&amp;rsquo;d like to highlight from this example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assessing face validity relies on domain knowledge. We need to first know what &amp;ldquo;human rights protection&amp;rdquo; means, only then we can see that more political imprisonment is an indicator for low levels of human rights protection.&lt;/li&gt;
&lt;li&gt;Assessing face validity is largely based on judgment based on domain knowledge, rather than empirical demonstration.&lt;/li&gt;
&lt;li&gt;Indicator validity is always assessed &lt;em&gt;relative&lt;/em&gt; to the concept we are trying to capture, rather than something inherent to the indicator itself. Gini coefficient is a valid indicator for income inequality, but not human rights protection.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&#34;when-do-we-test-for-construct-vs-face-validity&#34;&gt;When do we test for construct vs face validity?&lt;/h4&gt;

&lt;p&gt;Ideally both, and more if possible. Since having invalid measures are really bad news, assessing the validity of a measure in multiple ways would increase the confidence&lt;/p&gt;

&lt;p&gt;Face validity is rarely explicitly tested for &amp;ndash; we already &lt;em&gt;implicitly&lt;/em&gt; test for face validity when we are making the choice of which indicators to use to measure the concept. Although having face validity is important, high face validity alone is a rather weak evidence.&lt;/p&gt;

&lt;p&gt;Construct validity can be empirically assessed in two ways: &lt;em&gt;convergent validity&lt;/em&gt; and &lt;em&gt;divergent validity&lt;/em&gt;. See &lt;a href=&#34;https://fanghuiz.github.io/ps0700/post/2019-01-19-example-measurement-stata/&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt; for an example.&lt;/p&gt;

&lt;p&gt;Generally, if we are using the measures that have been used in published literature, we do not have to conduct separate validity test. The assumption is that they have already been previously validated (though we should still remain critical). If we are using new measures in our study, instead of established ones used in published literature, then it is recommended to first conduct a pilot study to test the measure&amp;rsquo;s validity and reliability. Use the measures as part of the actual study only after we know it&amp;rsquo;s valid and reliable.&lt;/p&gt;

&lt;h4 id=&#34;about-the-article-https-www-nytimes-com-2014-08-28-opinion-nicholas-kristof-is-everyone-a-little-bit-racist-html-we-read-on-using-iat-video-games-to-measure-implicit-racial-bias-how-is-the-reliability-of-the-measure-determined-if-the-same-respondent-takes-the-test-twice-and-gets-different-scores-but-in-the-same-direction-e-g-at-first-longer-then-shorter-time-is-the-measure-considered-reliable&#34;&gt;About the &lt;a href=&#34;https://www.nytimes.com/2014/08/28/opinion/nicholas-kristof-is-everyone-a-little-bit-racist.html&#34; target=&#34;_blank&#34;&gt;article&lt;/a&gt; we read on using IAT/video games to measure implicit racial bias, how is the reliability of the measure determined? If the same respondent takes the test twice and gets different scores but in the same direction (e.g at first longer, then shorter time), is the measure considered reliable?&lt;/h4&gt;

&lt;p&gt;For IAT, the actual computation of the score takes quite a few steps, but to simplify it a bit, it is the &lt;em&gt;reaction time differential&lt;/em&gt; that is used as a measure of implicit racial bias (see the test procedure &lt;a href=&#34;https://en.wikipedia.org/wiki/Implicit-association_test#Procedure&#34; target=&#34;_blank&#34;&gt;here&lt;/a&gt;). So in this case, the test can be considered as reliable if respondent has the same directional preference (e.g. consistently faster at White-Pleasant association, than Black-Pleasant association) when taking the test multiple times.&lt;/p&gt;

&lt;p&gt;For other tests however, it could be the case that time difference itself, rather than directional difference is used as the measure.&lt;/p&gt;

&lt;p&gt;In general, &lt;a href=&#34;https://en.wikipedia.org/wiki/Repeatability&#34; target=&#34;_blank&#34;&gt;test-retest reliability&lt;/a&gt; is measured as degree of correlation between the different test scores, rather than absolute difference. In psychology, rule of thumb is that test-retest reliability &amp;gt; 0.7 is an acceptable level, though this is no more than a convention used by researchers. IAT has a test-retest reliability of about &lt;a href=&#34;https://en.wikipedia.org/wiki/Implicit-association_test#Reliability&#34; target=&#34;_blank&#34;&gt;0.6&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This &lt;a href=&#34;https://fourbeers.fireside.fm/13&#34; target=&#34;_blank&#34;&gt;Podcast&lt;/a&gt; has a pretty interesting discussion on the use and critique of IAT.&lt;/p&gt;

&lt;h2 id=&#34;levels-of-measurement&#34;&gt;Levels of Measurement&lt;/h2&gt;

&lt;h4 id=&#34;can-you-elaborate-more-on-meaningful-vs-relative-arbitrary-zero-point-and-how-that-relates-to-interval-and-ratio-measures&#34;&gt;Can you elaborate more on meaningful vs relative/arbitrary zero point, and how that relates to interval and ratio measures?&lt;/h4&gt;

&lt;p&gt;A variable with meaningful zero point means that we can interpret the zero value as the &lt;em&gt;absence&lt;/em&gt; of that variable. For example, income measured in dollars has a meaningful zero — we can interpret &lt;code&gt;income = 0&lt;/code&gt; to mean an absence of income. So if someone reported zero on this measure, we know this person has no income.&lt;/p&gt;

&lt;p&gt;On the other hand, if the variable has a relative, or arbitrary zero points, we cannot interpret the zero value on that variable as the &lt;em&gt;absence&lt;/em&gt; of that variable. Say we have a set of 5 questions to measure people’s political knowledge. Every correct answer gets you 1 point, and every wrong answer gets you 0 point, which gives us a range of possible score from 0 to 5. If Ann gets &lt;code&gt;score = 0&lt;/code&gt; on this scale, we cannot say that Ann has no political knowledge at all. The zero here is simply an arbitrary point to signal a very low level of political knowledge.&lt;/p&gt;

&lt;p&gt;So how does this relate to interval vs ratio measures? Interval measures have relative/arbitrary zero points, and ratio measures have absolute/meaningful zero points. For the most part, the difference is only apparent (or we only need to pay attention to the difference) when we analyze and interpret the data.&lt;/p&gt;

&lt;p&gt;For interval measures, since the zero point is arbitrary and lacks any meaningful interpretation, we cannot compare any differences in terms of proportion. It only make sense to compare the difference in magnitude. Going back to the political knowledge example, if Beth gets &lt;code&gt;score = 2&lt;/code&gt; on the political scale, and Cathy gets &lt;code&gt;score = 4&lt;/code&gt; on the same scale, we know that: 1) Cathy is more knowledgeable than Beth, and 2) the magnitude of difference is 2 more correct answers. However, since the zero point is arbitrary in this case, we &lt;em&gt;cannot&lt;/em&gt; say Cathy is two times more knowledgeable than Beth. Or if we observe that Beth&amp;rsquo;s score increased from 2 to 3 after attending a civics education workshop, we &lt;em&gt;cannot&lt;/em&gt; cay that Beth&amp;rsquo;s political knowledge increased by 50%.&lt;/p&gt;

&lt;p&gt;Let&amp;rsquo;s compare to a ratio scale, income, which has a meaningful zero point. If Abe reported &lt;code&gt;income = 20k&lt;/code&gt;, and Ben reported &lt;code&gt;income = 40k&lt;/code&gt;, we know that 1) Ben has higher income than Abe, 2) Ben&amp;rsquo;s income is 20k higher than Abe, and 3) that Ben has an income twice as much as Abe.&lt;/p&gt;

&lt;!-- In fact, for most latent constructs / concepts, the indicators only have relative/arbitrary zero points, although we can construct a measure with a meaningful zero. For example, we can measure legislator ideology on left-right dimension using their voting record. Some legislators might be voting 100% left or 100% right. This indicator itself is a ratio measure — A who voted 80% left has twice as much liberal voting record as B who voted 40% left, and 0% liberal votes means someone has never voted for a liberal position on any bills (absence of liberal vote). Can we then interpret that A is twice as liberal as B? --&gt;

&lt;h4 id=&#34;can-a-measure-be-both-interval-and-ratio&#34;&gt;Can a measure be both interval and ratio?&lt;/h4&gt;

&lt;p&gt;The four levels of measurement are &lt;em&gt;mutually exclusive&lt;/em&gt; categories. The flow chart below should help you to distinguish the four categories.&lt;/p&gt;




&lt;figure&gt;

&lt;img src=&#34;measure_levels.png&#34; /&gt;


&lt;/figure&gt;

&lt;!--
```mermaid
graph TB
  A[Are the numbers just placeholders for categories &lt;br&gt; or do they have meaningful numerical values?] -- categories ---
  A1[Are the categories ordered&lt;br&gt; or unordered?]

  A1 -- unordered --- B1[Nominal]
  A1 -- ordered --- B2[Ordinal]

  A -- numerical --- A2[Is the zero value arbitrary&lt;br&gt; or meaningful?]

  A2 -- arbitrary --- C1[Interval]
  A2 -- meaningful --- C2[Ratio]
``` --&gt;

&lt;!-- ```mermaid
graph TB
  A[Are the researchers directly &lt;br&gt; manipulating the treatment?] -- yes --- A1
  A -- no --- A2[Is there an active change in the &lt;br&gt; treatment due to &#39;nature&#39;?]

  A1[Is the treatment &lt;br&gt; randomly assigned?] -- yes --- B1[Randomized experiment]
  A1 -- no --- B2[Quasi-experiment]

  A2 -- yes --- C1[Natural experiment]
  A2 -- no --- C2[Observational design: Do we have single &lt;br&gt; or repeated measures for the same unit?]

  C2 -- single --- D1[Cross-sectional]
  C2 -- repeated --- D2[Longitudinal: &lt;br&gt; Panel / Time-series]
``` --&gt;

&lt;!-- ```mermaid
graph TB
  A[&#34;Is the treatment assignment &lt;br&gt; random or &lt;i&gt; &#39;&#39;as-if&#39;&#39; &lt;/i&gt; random?&#34;] -- yes --- A1
  A -- no --- A2[Are the researchers directly &lt;br&gt; manipulating the treatment?]

  A1[Are the researchers directly &lt;br&gt; manipulating the treatment?] -- yes --- B1[Randomized experiment]
  A1 -- no --- B2[Natural experiment]

  A2 -- yes --- C1[Quasi-experiment]
  A2 -- no --- C2[Observational design: Do we have single &lt;br&gt; or repeated measures for the same unit?]

  C2 -- single --- D1[Cross-sectional]
  C2 -- repeated --- D2[Longitudinal: &lt;br&gt; Panel / Time-series]
``` --&gt;
</description>
    </item>
    
  </channel>
</rss>
