Image: Open Humans sketch. All images courtesy Open Humans

Cite as: DOI


Citation format: The Chicago Manual of Style, 17th Edition

Greshake Tzovaras, Bastian. ‘Community Science, Fueled by Your Personal Data’, 2019.

Science, research, and society are missing out on the wealth of personal data being generated from the likes of fitness and health monitoring, and genetics because of poor regulatory framework to support user orientated and privacy driven data policies — AKA data sovereignty! Open Humans offers a working platform and data storage inside a standardized ecosystem that puts the users in control of their data and protects privacy.

Can we be empowered to understand our ourselves — using our own data? In a world that is increasingly digitized, there are diverse types of information about us: Spotify knows what music we listen to, we share our thoughts and feelings on Twitter, and our phones and computers may be gathering rich information about our location, device usage, and physical activity. Beyond these “incidental” data, many people are doing more active tracking: e.g., with smartwatches or other wearables (Lamkin 2018), or by doing genetic testing to learn about disease risks and ancestry (Regalado 2017).

Most of these data sets remain stored in commercial data silos that are too often outside of our own control. While companies can use these data to learn about us and profit & resell those insights, most of us never really get to experience and explore the wealth of actively and passively collected data about us in all of those different ways. Exploring our own data could offer a great way to acquire more data literacy, which is becoming more and more important in this age of data misuse scandals (Sherr 2018). And just as importantly, accessing and controlling our own personal data empowers us: It gives us the choice of which data to share with others and under which conditions.

There is a huge potential for citizen and community science in sharing our personal data and exploring it together, as so many people are already collecting these data, which could be used for so many different experiments, covering a wide array of social and natural sciences. But there are a number of problems for actually doing community science projects: How can we collect and aggregate our datasets which are stored in all of these different commercial data silos? And how can we build a community science model that enables individuals to selectively share their data with others? To address these issues we have built Open Humans, an ecosystem for participant-centered research and personal data exploration.

Image: A diagram of Open Humans' data and project access permissions

Image: A diagram of Open Humans’ data and project access permissions

The Open Humans ecosystem is build around the idea projects that allow people to both import data from a variety of sources as well as share their data. Individual projects can both import personal data into an individual’s Open Humans account and also request access to files already stored in a person’s account. As the individual member controls which projects they want to join they have the control over which data to import into their Open Humans account and also control who is getting access to these data. Projects on Open Humans can be created and run by anyone: Individuals, community science organizations and academic researchers.

The flexibility allows for innovative research and data exploration projects from a variety of angles. Data import mechanisms for the numerous potential data sources can be written not only by the Open Humans platform maintainers but also by individual members. Thanks to this the system already supports a large variety of different data import mechanisms. In addition to data imports for, e.g., personal genetic data or activity tracking data from wearable devices further data importers have been created by community members. For example, a community member created an importer for health- & activity-records from Apple’s HealthKit. Similarly, the Type-I-Diabetes community wrote its own open-source tools to import data from their continuous glucose monitors into Open Humans.

The open framework of Open Humans furthermore allows its members to create projects which enrich existing data that members have already imported. For example, the Open Impute project allows members to enrich the genetic datasets (Arvai n.d.) they have already imported into Open Humans by using statistical methods to fill gaps in the existing data sets. The enriched data can then be used as the input for other projects that make use of the Open Humans ecosystem, like Genevieve, which generates reports on your genetic variants (Ball n.d.). This interplay of projects on Open Humans highlights one of the big benefits of the data storage inside a standardized ecosystem: As projects make use of the same core routines for accessing and depositing data, reusing the data in new projects is made substantially easier, as it removes the need to request people to newly upload or provide access to data in a complicated way. Instead, sharing and requesting data can be as easy as a click of a button.

The same centralized personal data storage can also encourage the creation of new research projects, as projects can ask members for permission to access their personal data for research purposes. These projects can range from being driven by academic researchers all the way to projects being completely managed by a community of participants. Examples academic research currently being facilitated through Open Humans include the Microbiome (Shaer, Price Ball, and Nov n.d.) & Genome Explorations done by a collaboration between NYU & Wellesley College (Shaer and Nov n.d.). In the realm of community science, the Type-I Diabetes community is on the forefront of performing community-driven research. Besides building their own data commons based on Open Humans, they are also running an experiment on the impact that diet has on their glucose levels to learn more how to manage their health.

Image: Diagram showing Open Humans’ model for sharing Personal Data Notebooks controlled by individuals managing their own personal data.
Image: Diagram showing Open Humans’ model for sharing Personal Data Notebooks controlled by individuals managing their own personal data.

In addition to projects to share data with, Open Humans also offers ways for people to explore and analyze their personal data. By integrating an interactive data analysis notebook into Open Humans, it offers a personal virtual machine which offers direct access to all the personal data a member has stored in Open Humans. This offers secure and private access to the personal data, providing an environment to generate data visualizations and analyses. Members can share these notebooks publicly, allowing others to re-use these data analyses, using their own personal data respectively, also offering a way to improve the existing methods. Through this members can collaboratively generate a library of tools to perform data analyses.

If you want to learn more about Open Humans, you can check out our preprint on the ecosystem at large (Tzovaras et al. 2019). And if you want to take your own personal data and start exploring and maybe even sharing it: Sign up at Open Humans and start playing around with your data.


Lamkin, Paul. ‘Smart Wearables Market To Double By 2022: $27 Billion Industry Forecast’. Forbes, 2018.

Regalado, Antonio. ‘2017 Was the Year Consumer DNA Testing Blew Up’. MIT Technology Review, 2017.

Sherr, Ian. ‘Facebook’s Cambridge Analytica Scandal: From Trump to Data Mining – CNET’, 2018.

Arvai, Kevin. ‘Imputer’. Accessed 9 May 2019.

Ball, Mad. ‘Genevieve Genome Report – Open Humans’. Accessed 9 May 2019.

Shaer, Orit, Mad Price Ball, and Oded Nov. n.d. ‘UbiQomix Microbiome Exploration – Open Humans’. Accessed 9 May 2019.

Shaer, Orit, and Oded Nov. ‘GenomiX Genome Exploration – Open Humans’. Accessed 9 May 2019.

Tzovaras, Bastian Greshake, Misha Angrist, Kevin Arvai, Mairi Dulaney, Vero Estrada-Galiñanes, Beau Gunderson, Tim Head, et al. ‘Open Humans: A Platform for Participant-Centered Research and Personal Data Exploration’. BioRxiv, 2 May 2019, 469189.