Using Slack to Run a “Data for Good” Replication Marathon

The founder of a Hacker News-style site for data for social good projects says that there is not enough replication in the civic hacking community, and he means to change that.


A year after launching DataLook, a Hacker News-style site highlighting data projects for social good, Tobias Pfaff and his colleagues are spearheading a 10-week replication marathon of some of the site’s top reusable projects in advance of a TEDx competition they qualified for this spring. Participants are finding each other and collaborating on Slack, although if it makes more sense to take problem solving to outside sites—Github’s issue tracker, for example—they are encouraged to do that as well.

“I think there is not enough focus on replicating projects [in the civic tech community],” founder Tobias Pfaff tells Civicist in a Skype interview. “I think it might be less sexy to do things that other people have done before.”

However, Pfaff also points out that replicating projects can be faster and easier than starting an open data project from scratch. Replication, he says, “can be super sexy” because you can get things done—and start having an impact—quickly. He points Civicist to Jason Hibbets’ framework for civic hackers, which outlines three kinds of projects: green fields (new and untested); cloned (tested, approved, and repeated); and augmented (tested and improved upon).

One successful and much-discussed replication is the late U.S. Politwoopstransparency project documenting politicians’ deleted tweets, which was based on a project first launched in the Netherlands in 2010. The service recently made headlines after Twitter pulled its API access for violating terms of service. However, other iterations of Politwoops continue to run smoothly in 30 other countries.

The first project replicated as part of DataLook’s marathon was a Twitter bot that automatically posts information about animals up for adoption at local shelters. The person behind it, Slack user justnisdead, says that future replications would only take 15-30 minutes per bot.

DataLook’s goal for the marathon is to demonstrate the impact that replication can have in just 10 weeks, and then to challenge the TEDx judges to imagine what they could accomplish if the marathon was extended to a year or more.

DataLook (originally Data for Good, until they found that name was already a registered trademark in the U.S.) was built during a startup weekend in Germany last year. It was always meant to be a home for replicable data for good projects, however in the year since Pfaff has found that the user base is really too small for a robust upvote/downvote-style site. There just isn’t enough traffic.

(He speculated this might be because many of the major players in the civic tech scene—Code for America, for example—are hosting many of these conversations in private or semi-private/branded spaces, and that others are spread out on various platforms like Reddit and DataTau.)

And yet Pfaff and his DataLook colleagues know many of the projects on the site are worth replicating. “A month ago,” Pfaff says, “we went through our complete database and discussed which [projects] are really cool and which are reusable…[which solve] generic problems that appear in every city around the world and at the same time the code is open source.” These are the projects they pulled out for a shortlist, and are actively encouraging data scientists to replicate during the marathon. The shortlist of projects includes Councilmatic; FixMyStreet; a food inspection forecasting app; Link-SF, a resource for homeless and low-income city residents; and more.

DataLook has asked encouraging interested parties to join an open Slack channel and find the projects that most interest them and connect with likeminded people. There are currently twenty or so members of the general DataLook channel.

Pfaff makes clear that the end of the marathon is not meant to be the end of replicating projects, but that the purpose of the marathon is “to see what is possible within a given timeframe.”

“And then we can see what happens next,” he adds.