NCI Biomedical Informatics Blog
- Shape the Data Sharing Landscape: Make a Difference
- NCI’s Office of Data Sharing: Setting a “Gold” Standard for Childhood Cancer
- The Promise and the Challenge of Deep Learning in Pathology
- Predictive Modeling for Pre-clinical Drug Screening: Improving Models Derived From Observational Studies Using Machine Learning and Simulation
Cancer Research Data Commons "Nodes"
A Node is a CRDC repository containing related data that has been harmonized and stored in a format that is accessible and ready for analysis by the research community, brought together with infrastructure for security, interoperability, and elastic compute capability.
The vision of the NCI Cancer Research Data Commons is a network comprising multiple nodes, with researchers, tool developers, clinicians, and patients contributing and accessing tools and data.
Each CRDC node will have a data-specific submission and curation process, determined by domain experts, that harmonizes the data and applies the standard metadata necessary for sharing and analysis. Each node will be centered around a scientific domain or program and community needs will determine the appropriate analytic and visualization tools available.
Q: What CRDC nodes are available now?
A: Currently, the Genomic Data Commons (GDC) is the only node that is available. The GDC was developed over the past several years to be a unified repository for cancer genomic data. Now that the CRDC project has been initiated, the GDC will become the node for genomic data in the Commons.
Q: What CRDC nodes are under development?
A: Work has begun on nodes for imaging and proteomics data.
Q: What will be the data sources for the CRDC Nodes?
A: Data available through the CRDC will come from many sources and will continue to grow over time. Data will be incorporated in the CRDC from NCI programs such as the Human Tumor Atlas Network; from third-party programs, such as AACR GENIE, Foundation Medicine and the Multiple Myeloma Research Foundation; from NCI labs and grant-funded programs; and from collaborative programs such as the Applied Proteomics Organizational Learning and Outcomes (APOLLO) network. The GDC has been populated with the data from The Cancer Genome Atlas (TCGA),the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) program, and a growing number of other sources. The initial data planned for the Proteomic Data Commons are from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) program, APOLLO, and the International Cancer Proteogenome Consortium.
Q: Who is creating the Data Commons nodes?
A: In the near term, CRDC nodes will focus on NCI programs and will be deployed and managed by NCI. However, the long-term vision is that the CRDC will interoperate with other Data Commons.
Q: How do I learn more about the status of the development of these new CRDC nodes?
A: Information about the new CRDC nodes will be published on this website, as well as publicized through the NCI social media channels, such as LinkedIn and Twitter. Additionally, NCI is planning workshops and other outreach activities to gather input from the community to help guide the direction and priorities of the CRDC.
Q: Is there a repository I can use to store data that currently has no node available?
A: The NCI is establishing special data node to broaden sharing of cancer genomic data. A large number of NCI-funded programs are generating genomic data types that are broader than those currently accepted by the Genomic Data Commons.
The GDC has been unable to accept certain data because the data do not fit the current data type criteria for GDC submission, the data do not meet the minimum metadata standards for GDC submission, or the data meet both criteria above but the GDC has a backlog of data in its submission and harmonization pipeline and submission/release would be significantly delayed. This node will provide for the storage and sharing of cancer genomic data that have not yet been harmonized by the GDC process or otherwise do not meet criteria for submission to the GDC.
Q: Will the data in the CRDC be publicly accessible?
A: Yes, but some of the data will be controlled access, requiring approval for access, depending on the Data Use Agreements in place and on whether the node contains individual-level genotype and phenotype data that have been de-identified. Data Use Agreements may be specific to dbGaP or other established NIH access processes / policies. Each node will contain information about the data it hosts, such as descriptions and metadata, as well as instructions on how to gain access to it.
Q: How can I gain access to controlled access genomic data?
Investigators with appropriate NIH or eRA Commons credentials should visit https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi to request access to controlled-access tier datasets through the dbGaP approval process or to learn more about the process. If you do not have an eRA Commons account, please visit the eRA Commons website and complete the registration form. If you have any questions regarding the application process, please contact the support [at] nci-gdc.datacommons.io (dbGaP Help Desk).