10 Opportunities for Impact Measurement in Civic Tech
Last May, Civicist published an essay by Matt Stempeck called “10 Problems With Impact Measurement in Civic Tech.” As I read the well-researched and thoughtful piece, I was surprised to find that while I agreed with many of Matt’s insights, I totally disagreed with his conclusions.
I’ve been working in civic tech since 2008, when I served as New Media Operations Manager on President Obama’s first campaign. After spending time as a globe-trotting digital activism consultant, a brief tour through academia and activism data, and a book on digital activism called Digital Activism Decoded: The New Mechanics of Change, I see tech as a tool rather than a solution. I’m excited by its affordances in particular strategic contexts.
And I still see value in civic tech and I believe we can do much to demonstrate and measure that value. Tracking the impact (social benefit) of civic tech is not as difficult as Matt fears. In this post, I’ll respond point-by-point to each of the ten problems Matt outlined, sketching a more hopeful prognosis for impact measurement in civic tech.
Impact Problems (or Not)
The ten problems that Matt described are italicized below, followed by my rejoinders.
- We’re all using different metrics. True! And this is good. We should value metric diversity.
- Sharing is irregular. That’s also true, but it’s very understandable. Civic tech organizations need to guard the privacy of their data.
- Most projects don’t reach most people. This is not a problem, unless a tool’s theory of change expressly requires huge scale. It’s far important to ask if a tool is reaching the right people.
- Different constituencies want different impacts… but the internal strategist is the most important.
- We don’t evaluate relative to investment made…. and we should.
- We don’t evaluate relative to the macro environment… so let’s!
- Quantitative metrics can miss the objective…. so use qualitative measures when needed.
- Case studies are too often biased… and they’re plain lazy as far as impact measurement goes.
- Causality is hard to prove in social environments… but theory of change analysis can help.
- Our externalities may eat us all… but externalities can be beneficial as well as harmful and truly catastrophic ones are rare.
- Metric Diversity is a Strength, not a Weakness
Matt’s first point is that it’s difficult to measure the impact of civic tech systematically because “there’s a lot of variation in the types of metrics groups collect and share.” While I agree that a lack of shared metrics makes it difficult to compare impact across projects, it makes perfect strategic sense for each individual tool.
Allow me to make an analogy: track and field. I was not a jock in high school (big surprise), but I do recall that we had a track and field team at my New Jersey high school and this team had different athletes that competed in different events: sprints, middle and long-distance running, hurdles, pole vault, javelin throw, shot put…. I could go on, but I won’t.
[Impact] should be defined
by individual performance,
not by expecting all tools to
measure… in the same way.
Now, let’s assume that the coach wants to evaluate the athletic ability of his team comparatively. He decided to have them all run a timed race of 800 meters. This is great news for the middle-distance runner. It’s even okay of the sprinter and long-distance runner. But it’s not great for the shot-putter, who’s a fairly bulky guy. He’s great at shot-put, but not so quick on his feet. People would think the coach was nuts.
It makes sense for civic technologists to measure impact according to their own unique goals and theory of change (more on that later) rather than to measure impact according to a least common denominator metric that may not be relevant or possible.
In the worst case scenario, measuring impact by an irrelevant metric can be worse than useless. It can actually divert valuable time and resources as a project team seeks to maximize the desired irrelevant metric and makes impact less likely. The impact of civic tech should be defined by individual performance, not by expecting all tools to measure impact in the same way.
- Refusal to Share Isn’t the Problem
Matt writes that groups “share their impact numbers when those numbers are going up, and then become silent when the trends are less exciting.” While frustrating from a research perspective, the stakes for failure are high. It’s understandable if groups prefer to address disappointing results internally, rather than broadcasting the fact.
Tools are diverse and their
impacts may not be comparable.
Matt is interested in tool failure and in assessing the impact of civic tech as a field. The first research aim is undoubtedly stymied by lack of sharing bad news, and I sympathize. The fix here is to change the incentives on what constitutes bad news and how it is shared it. Focusing on adaptation (responses to failure) rather than failure itself can help. With this change in focus, narratives of single-cycle failures become ongoing narratives of multi-cycle adaptation. (See diagram below.)
The second aim, to comparatively assess the impact of the entire field of civic tech tools, is likely not possible for reasons laid out in the previous section. Tools are diverse and their impacts may not be comparable. An assessment of the impact of the civic tech field would look more like the results of a track meet (seconds of speed, feet of height, pounds of weight) than the result of a single race, where all projects can measure their effects by the same stopwatch.
- Don’t Reach Everyone, Reach the Right Ones
Matt continues to seek a common means of impact comparison when he observes that scale is “one common impact metric” used by many civic tech projects. You should already be suspicious of a claim that there is any one impact metric that is relevant to all civic tech projects. Let’s talk about the particular problems with using scale as this metric.
Briefly put, it’s who you reach, not how many. Any organizer will tell you that change happens by persuading a specific target with power over a specific social problem to take action to remedy that problem. Maybe the target is a Senator with the power to pass a law. Maybe the target is a constituent with the power to call that Senator and ask her to pass the law. Maybe the target is a person with friends in swing districts who has the power to remind those friends to vote, as the Vote with Me app* did.
[C]hange happens by persuading
a specific target with power
over a specific social problem….
Let’s say Vote with Me had the option of measuring their impact in two ways. The first was to maximize downloads, to maximize the scale of users. The second was to measure the number of messages sent to voters in swing districts, their actual goal. The first is vaguely connected to their goal of mobilizing swing state voters. The latter measures it precisely.
Despite writing that lack of scale is a problem, I think Matt and I agree here. He notes that scale is a necessity of venture capitalism not advocacy. “[R]eaching a large number of users does not inherently equal ‘impact,’” Matt writes, before quoting internet researcher Stefan Baack’s distinction that “[o]ften, what matters in terms of impact is how relatively small groups are using civic tech applications.” In civic tech and social change, it’s not how many, it’s who.
- The Most Important Constituency is the Internal Decision-maker
In his next section, Matt rightly notes that “different constituencies may want to see completely different results before they’ll consider your work effective.” He then lays out an example:
Your actual theory of change might dictate: more sponsors of this legislation will increase the chance it passes, and that will achieve our goal. But funders might instead require you to score and track media appearances. And the public, your members, might expect you to offer compelling social media updates and respond to the news cycle.
Different constituencies, yes, but not all equally important to impact. Ultimately, the first audience of any impact metric (co-sponsors, media appearances, social media posts) must be the internal decision-makers responsible for ensuring that the civic tech tool they have made achieves its desired impact. This means identifying key performance indicators (KPIs) that act as the best feasible measure of impact achievement and then focusing resources toward their maximization. (More on those later as well.)
If members or funders question your choice of KPI, explain your rationale to them. Explain why you think that specific metric is the best way for you to track impact and make the tactical decisions necessary for adaptation and ultimate success. Members will likely appreciate the inside view into how your organization works.
If a funder insists on acting like
a bad track coach… they are not
the right funder for you.
Funders are a more powerful constituency because they hold the purse strings. The best way to avoid disagreement over impact metrics with a funder is to state your strategy and KPIs early in the grantmaking process, so any disagreements can be aired and negotiated early. If a funder insists on acting like a bad track coach, asking all grantees to run a metaphorical 800 meters to be evaluated, they are not the right funder for you.
A compromise may also be possible. Perhaps they are willing to create a mix of metrics that includes your internal KPIs and a few of their own metrics that will not cause you to drastically divert resources and reduce your impact, but still let them feel they have a sense of your progress.
- Let’s Measure Impact ROI
Matt and I agree that “[e]ven when we adequately measure outcomes, we often fail to do so in relation to the resources that were invested….” He gives the example of the 92nd Street Y’s free invention of Giving Tuesday and the $3.5M invested in social good social network Jumo which, as noted by Micah Sifry, came to nothing, as examples of how the ratio of investment to impact can vary substantially in the civic tech field.
[T]he ratio of investment to
impact can vary substantially
in the civic tech field.
I fully agree that we should measure impact return on investment (ROI). That is, we should measure the amount of impact derived from a given financial investment. The caveat is that the lack of impact comparability (see #1) still holds. Different impacts will have different metrics. One app may generate 400K vote reminders for $300K and another app may generate 300K petition signatures for $200K and it will be up funders to decide which tool to fund based on their own strategic priorities.
- Account for Context When Measuring Impact
Matt also makes the excellent point that “[w]e inadvertently penalize groups working in very difficult contexts, and…ascribe prowess to groups that may have benefited from fortunate timing.” He then references how the legislator contact app Countable (among many possible examples) has seen massive user growth not only through their own solid design decisions but also due to increased interest in political activism following President Trump’s election.
The problem Matt is referring to is omitted-variable bias, a situation in which the exclusion of a causal factor from a statistical model results in the model “attributing the effect of the missing variables to the… effects of the included variables.” In the Countable example, the omitted causal factor is that Trump is President and could be represented by a simple binary variable called, say, TRUMP, with values: 1 = yes and 0 = no.
It’s also possible (and desirable)
to account for alternative causal factors
when one isn’t doing regression analysis.
It’s also possible (and desirable) to account for alternative causal factors when one isn’t doing regression analysis. For example, in a civic tech impact evaluation that DoBigGood is currently working on, we intend to ask users not only for baseline data (“At what level was X before you used this tool?”) and post-use data (“What level is x now?”) but also if they can identify other likely causes for the change.
Though asking a user to identify alternative causal factors may seem unscientific, these are the individuals with the most granular awareness of the actions they took to achieve a given impact goal. It’s also easy to weave the question into a qualitative data collection process, and it relies on an expert informant – the user.
- Don’t Be a Slave to Quantifiability
Matt also bemoans the insufficiency of quantitative metrics in all contexts, particularly much-hyped KPIs, first introduced in section #4. A mismatch between true impact and a “stale proxy” can result not only in “KPI fatigue,” but a misperception of objectivity, which makes internal decision-makers blind to contravening evidence.
Drawing from his own experience, Matt points out the mismatch between the KPIs in Hillary Clinton’s 2016 campaign and the electoral outcome. This was not just bad luck, he argues, but bad measurement. Data on endorsements and voter outreach, which showed that the campaign was winning, were privileged; while evidence of trouble, like desperate pleas for support in Rust Belt states, were dismissed as “anecdotal.”
If your quantitative metric
is a “stale proxy,”
find another metric.
This all makes sense. And, yet again, I don’t see this as a hard problem to solve. In the civic tech impact evaluation I mentioned in the previous section, we privilege quantitative metrics because, as McCoy et al. note, they are precise, one-dimensional (measure one phenomenon), and relatively unambiguous in interpretation.
Yet we at DoBigGood are not slaves to quantifiability. For example, for one tool in the evaluation, we realized that the impact we needed to measure was change in the quality of decision making. Rather than coming up with some tortured method of expressing decision making numerically, we decided to measure this impact qualitatively by asking users whether and if their decision making around a specific activity changed as a result of using the tool. If your quantitative metric is a “stale proxy,” find another metric.
- Abandon Case Studies
Hopefully, my suggestions so far have been uncontroversial. For the previous sections of this post, I feel my responses have been on the order of “yes, and…” or “…so here’s how we fix that.” Now I may be making my first true provocation: Case studies are the laziest and least scientific form of impact evaluation and you should stop using them for that purpose.
Case studies are the laziest
and least scientific form
of impact evaluation….
While Matt merely asks for “truly independent case study authors and more balanced reporting,” I’d go further. I think they have no place in impact evaluation. Why? Because case studies are in-depth anecdotes and anecdotes are bad science. They’re not representative. They’re cherry-picked by their very nature. They cover very little ground in terms of the scale of a phenomenon. We can do better.
I’m not saying case studies have no place in reporting the results of civic tech projects. They are a great way to illustrate a method or practice that has been shown to be successful by more rigorous methods (i.e., “This is a case study of how we used data mining to identify likely voters” or “This is a case study of how we integrated user testing into our communications work”). They are a great way to flesh out an insight with narrative, context, and character.
But they are not a good way to evaluate the impact of a tool. In evaluating impact, seek scale. Seek multiple data points. Seek representativeness. And then use a case study format to share your findings. In all but specific clinical contexts, case studies are communication tools, not evaluation tools. Sorry.
- Use Theory of Change to Map Complex Causality
What is the effect of civic tech on society? This is the most important impact question in civic tech and also the more challenging.
Matt is right to point out that “causality is hard to prove in social environments.” This is because the causal connection between a tool’s effect on its users and on society requires a series of causal steps that are often unclear. And, because of the number of dependencies between individual and societal effects, the causal connection between tool and impact will usually be weak. But this connection should still be sought.
[M]apping the causal chain…
is important not as a means of
retrospectively evaluating impact, but
as a means of prospectively creating it.
Perhaps counterintuitively, mapping the causal chain between tool and impact is important not as a means of retrospectively evaluating impact, but as a means of prospectively creating it. It is a crucial strategic planning tool.
Fortunately, project managers and technologists are not alone in mapping this causal chain. Theory of change can help. As described in the video below, theory of change is a set of expectations about how one’s actions will achieve one’s goal. It is also a work-backward method of identifying that set of expectations and a standard for diagramming their interconnections.[embed: Theory of change video]
By the end of the process, you should be able to summarize your theory in a statement such as “The individual will use the tool in Q way, resulting in X outcome, which results in Y outcome, which results in Z social impact.”
- Be Alert to Unintended Impacts… and Fix Them
Matt’s post ends on a somber note. “The unintended consequences of our work could end up more impactful than the work itself. Sit with that for a minute,” he writes.
He is referencing catastrophic negative outcomes, like Facebook being used to introduce terrorists to one another or greater government transparency resulting in greater cynicism, rather than greater civic engagement.
[A]sk… about unintended
effects… early on, when they
can still be corrected.
All the cases he references are real (and really unfortunate), but if fear of the unknown were to immobilize us, that would be an even greater tragedy. It’s also not necessary.
Once one has gone through an impact planning process like theory of change, one will have in mind the specific impact one wishes to create and the process by which one expects that impact to occur. One will also have a clear set of metrics and milestones designed to make the impact plan a living strategic document that adapts according to the real effect of one’s civic tech work.
These check-in points are a great opportunity to ask tough questions about unintended effects, not at the end of a project, when small missteps have resulted in great harm, but early on, when they can still be corrected. Qualitative KPIs are the most likely way to capture this type of information. You may not find unintended effects in your Google Analytics dashboard. You are quite likely to find them by talking to people.
Though examples of unintended
harm are… unsettling, those
seeking to do good rarely
achieve the exact opposite.
Imagine if Facebook had a system of attentiveness to unintended effects for their political advertising products. Russian manipulation of the 2016 election could have been caught in the bud and neutralized. It was not the the problem was invisible. Once they started looking, Facebook engineers found the evidence. It’s just that they weren’t looking. Have the courage to look.
Also, I have no reason to believe that unintended harmful effects are any more common than unintended beneficial ones. So there’s no need to look gingerly for unintended effects while harboring a sense of dread. You’re just as likely to find an unintended positive outcome that you can then claim credit for. Though examples of unintended harm are dramatic and unsettling, those seeking to do good rarely achieve the exact opposite.
I’m optimistic about impact measurement in civic tech. After reading this post, I hope that you are too.
Mary Joyce is the founder and principal of DoBigGood, an impact planning and measurement firm based in Seattle. She can be reached at mary @ dobiggood . com.
* A friend of mine was a data volunteer on this project, but I have no information on how they tracked impact.