General Information
Project description
This project leverages an online survey experiment among federal employees involved in the process of translating information and evidence about a program into actionable policies and programs. The experiment’s primary aim is to test ways of presenting information to improve sensitivity to impact-relevant information about a program. The impact-relevant features tested are: the number of people impacted, the intermediate or final outcomes and the persistence of effects. In addition, the experiment seeks to improve our understanding of what the process of making decisions about programs and policies looks like across the federal agencies involved in the experiment.
Federal employees will be invited to take part in a short online survey via the General Services Administration Qualtrics platform.
Participants will see five hypothetical program descriptions and will provide a “valuation”, defined as an assessment of the maximum cost at which they think it would be worth funding the program.
Detailed information
Final report: Is there a final report presenting the results and conclusions of this project?
Pre-analysis plan: Is there a pre-analysis plan associated with this registration?
Hypothesis
1- Whether particular modes of presenting evidence (Impact Calculator and Side-by-Side) can improve sensitivities
2- How does sensitivity and the impact of our treatments depend on the particular impact-relevant feature? For instance, are people differentially sensitive to a change in scope compared to a change in the outcome?
3- Is higher certainty correlated with increased sensitivity to impact?
4- Do the impact calculator and/or side-by-side presentations increase certainty in responses?
5- Does measured sensitivity look comparable across incentivized and unincentivized responses?
6- Is more experience with evidence and evaluation correlated with both increased sensitivity as well as higher certainty?
7- Is the time spent on the program valuation screens predictive of increased sensitivity?
How hypothesis was tested
The survey will be broken down into three sections, which each respondent sees in random order:
1. Baseline - 2 valuations, 2 unique programs: Document baseline insensitivities in “valuations”; to what extent are policymakers attending to features like scope or outcome type when assessing a program?
2. Treatment 1 - Impact Calculator - 2 valuations, 2 unique programs: Introduce an “Impact calculator” to increase sensitivity to impact-relevant features of a program. If respondents are presented with a calculation of impact (based only on the program description on the page), does sensitivity increase?
3. Treatment 2 - Side-by-Side - 2 valuations, 1 unique program: Show program descriptions side-by-side rather than sequentially. Does the direct comparison increase sensitivity to impact-relevant features?
After respondents see each baseline question, they will then be asked to assess their certainty in their valuation. Specifically, they will answer the following question: How certain are you in your answer ?
After responding to these six core questions, respondents will again be shown one of the baseline program descriptions. This time, they will be asked to predict the modal cost selected by other respondents. Because this beliefs-based question has a clear true or false answer, we are able to incentivize responses.
The survey will conclude with questions to understand the additional barriers policymakers confront when translating evidence into policies and programs, as well as what this process looks like at different agencies. The survey will also ask for information about respondents’ work in government as well as demographic characteristics.
Each hypothetical program description will include randomly-assigned variations corresponding to different hypothesized impact-relevant features of a program
These features can be multiplied together to estimate the impact of a program. Specifically, multiplying the number of people a program affects, the rate at which a “final” outcome is achieved, and the persistence of effects gives us an estimate for the people affected per year, who wouldn’t achieve the outcome in question in the absence of the program. Each respondent will see one of four combinations of these variations for each program:
1. High scope, Final outcome, Long persistence
2. Low scope, Final outcome, Long persistence
3. High scope, Intermediate outcome, Long persistence
4. High scope, Final outcome, Short persistence
Dependent variables
1- The primary outcome for this analysis is the “valuation” or perceived program value, which is defined as the maximum cost at which the respondent is willing to fund the program, as identified in the survey. We log-transform this outcome given that the costs are presented on a semi-logarithmic scale. We then normalize the program valuations around the mean perceived value ascribed to the lowest-impact combination for the program.
The only new outcome measure the questions 2 to 7 introduce is that of certainty when exploring heterogeneities. The baseline certainty measure will be coded as a continuous variable ranging from 0% to 100% certainty (or certain). The certainty measure relevant to treatment effects will be coded as -1 if respondents report less certainty in the face of a treatment, 0 if respondents are equally certain, and 1 if respondents report more certainty.
Analyses
1- Regression of valuation over the condition, the program's impact and controlling for program and participant fixed effects.
2- Relative sensitivity of the impact features
3- Heterogeneous treatment effects by our three impact features
4- The relationship between certainty and sensitivity to impact
5- The impact of our treatments on certainty
6- The relationship between incentivized and unincentivized responses
7- The relationship between respondent characteristics, notably experience with evidence
and evaluation, and sensitivity as well as certainty
8- The relationship between time spent on the program valuations and sensitivity
Data Exclusion
We will exclude respondents who didn’t complete the main portion of the survey. We will also only
keep the first complete response received via a single personalized survey link. That is, if the
survey was taken multiple times using the same survey ID, we will not look at additional complete
responses. We will also run robustness checks where we look only at respondents who spent at
least 4 minutes on the survey in total
Treatment of Missing Data
Treatment of Missing Data:
We will require responses to all core program valuation and certainty questions. Any skipped
end-of-survey questions will be dropped, without imputing the data.
United States
Washington DC
Who is behind the project?
Project status:
Completed
Methods
What is the project about?
Date published:
4 June 2021