IF YOU HOPE PEOPLE WILL DO WHAT YOU MEASURE
MAKE SURE YOU MEASURE WHAT YOU WANT THEM TO DO
I was asked to give some thoughts on construct validity in governance-type indicators. Here are some thoughts and references to sources that help me in developing such thoughts.
First, I would start with the critique Melissa Thomas provides of the world governance indicators. She asks simply, 'what do they measure' and in so doing raises a major construct validity question, with two dimensions: (1) is there underlying theory about how the things measured in the indicator affect the outcomes designers care about? (2) is there evidence that the things being measured are related to these outcomes?
See also the response by Kaufmann, Kraay and Mastruzzi. http://info.worldbank.org/governance/wgi/pdf/kkmresponseejdr2final.pdf They argue a blend of 'construct validity isn't relevant' and 'you don't prove we have a construct validity problem anyway.'
Whatever you think of Thomas's paper and the response, the issue is simply whether the thing being measured correlates with the thing it is meant to be an indicator of. As with simple ideas, however, there is more complexity to consider. Indeed, there are multiple dimensions of this construct validity test.
1. Construct validity requires content validity: does the measure reflect on all facets of the construct? Lawshe proposed a way of testing this, suggesting that panels of experts be given a list of all the questions one intends to ask about a construct, and asked whether each question relates to an essential, useful bot not essential, or Unnecessary aspect of the construct. They could also propose questions that have been left out.
With enough feedback, one can then get an idea about the core content of a construct. If a question is roundly endorsed as 'essential' for example it would be part of the non-negotiable part of the indicator. If it were anything less, one would have to decide if the question was a negotiable, cutting edge or unnecessary element in the indicator.
I have some cautionary notes. First, make sure the experts explain their results- providing theoretical reasoning to back their argument. Make sure they are not just making a normatively biased call. Second, make sure your panel of experts has a good understanding of the construct you are trying to measure and why you are measuring it (see yesterday's discussion on indicators and outcomes). Third, make sure your panel is not biased, reflecting only one normative opinion of the construct. This is one of the problems I think the world governance indicators and other governance indicators has encountered. Kurtz and Schranck and others simply say (with justification) that these indicators reflect neoliberal bias and the perspectives of private sector operators. Fourth, I think you need to go beyond lawshe to actually test the correlation with questions and outcomes, continually, so that over time you build evidence in support of your content decisions.
I have included a couple of links to studies that do this ( in areas like education and health care).
- This is a citation to a study on Ccnstruct validity of 3 indicators of psychological distress in relation to perceived health by Koopmans and Lamers. http://www.mendeley.com/research/assessing-construct-validity-three-indicators- psychological-distress-relation-perceived-health-physical-illness/ The study asks about how each indicator relates to the outcome in question. It uses regression analysis to test the relationship. Combining theoretical and empirical work like this helps foster content validity.
- See also Willis, Stoelwinder and Harris. Their work focuses on interpreting indicators in health care. The authors test the assumption that quality indicators reflect on the quality of care in hospitals. http://www.ncbi.nlm.nih.gov/pubmed/18603538
- See also Mathews, Hackett and Pennell on construct validity of accountability measures and performance indicators in schools.http://www.learningpt.org/sipsig/mathews.html
2. Convergent validity is another aspect of construct validity. This asserts that a measure parading as an indicator of x should converge with other indicators of x. The Tests here are quite simple. Look at indicators that purport to reflect on the same construct and see if they are correlated. An example is Townsend and Kaiser who looked at indicators of fruit intake and psychosocial health. http://www.ncbi.nlm.nih.gov/pubmed/16029687
The world governance indicator folks have done this, and argue emphatically that their indicators correlate highly with other indicators of constructs implied in the broad idea of good governance. There are some problems with this, though. First, the 'good governance' concept is so broadly presented in this work that one is bund to found overlap. Second, the world governance indicators are calculated on the basis of many of these 'related measures'. So, one should expect correlation. But this is mathematical convergence, not theoretical convergence.
Another approach to testing construct validity of a new indicator involves taking the new indicator and testing it side by side with a prior indicator that is acknowledged as having some construct validity already. I like Burke, Burke and Crowder in this regard. http://aei.sagepub.com/content/31/4/1.abstract
The article looks at how student skills of a new indicator of early literacy scores correlated with an older indicator of a similar construct that was already known to predict Later reading efficiency.
3. Discriminant validity is the third aspect I will draw on now. It notes that a measure meant to indicate x should not indicate y, z and other constructs. It is kind of like the opposite of convergent validity in that you should be able to see that it does not correlate with (and rather diverges from) indicators of other things.
This is a dimension where world governance indicators have struggled. The argument of people like me is that these indicators correlate with measures of economic well being, such that it is hard to say they are not just another way of measuring this. Neumann and Graeff suggest this is one of the questionable parts of the good governance indicators they look at. http://www.springerlink.com/content/t417158753238u2q/
An empirical consequence of this problem is that governance indicators typically correlate with other economic, social and political indicators. This means that there could be measurement problems with regressions with governance indicator scores as explanatory variables and democracy, GDP level, etc also as explanatory variables.