FAQs for Big Data to Knowledge (BD2K) Center RFA-HG-13-009
September 30, 2013
Scope of the RFA and the Centers
What is the definition of "Big Data" for this RFA?
We are intentionally not defining "Big Data" as a specific amount of data. It is defined as data sets that are larger or more complex than can be dealt with using current standard methods. Your application should describe how solving the data science question you want to address will enable biomedical researchers to use and analyze Big Data. Some such solutions will also be useful for small data sets, but this RFA is focused on addressing the challenges posed by Big Data.
Should we target the interests of a specific Institute or Center (IC) when responding to the RFA?
This RFA is trans-NIH, and all NIH ICs have needs relating to Big Data. Some approaches may apply to particular interests of specific ICs, but the applications are going to be reviewed and chosen based in part on their generalizability.
Is large-scale data production supported under this RFA?
No. Applicants may propose some data production or acquisition to support the development of computational or informatics approaches, if necessary and well-justified. However, these activities may not take up a large proportion of the budget, and are not intended to provide major data sets or central comprehensive database resources for a field.
Are central databases, knowledge environments, or research commons supported under this RFA?
This RFA aims to support research to address data science challenges in managing and analyzing Big Data. This RFA will not support the establishment of production-grade databases, knowledge environments, or research commons to allow researchers simply to access or analyze large data sets. However, models of these may be developed as examples that provide lessons for how to develop and manage such systems. The application should explain what data science problems are being addressed and make the case that these model systems will help the development of larger and more comprehensive systems that will allow researchers or clinicians to extract useful knowledge from the data.
Is the RFA about developing the data science for enabling technologies?
The RFA is primarily about research on data science in the context of biomedical research; this should be mostly computational and informatics research, not research specifically addressing biomedical problems. In the course of such research, enabling technologies are expected to be developed. These will include new data science results as well as scientific methods, approaches, software, tools, and related resources that will enable tools and databases to be developed to allow a broad range of researchers to extract useful knowledge from the data. The RFA will not support simply doing large-scale analyses of data sets with existing methods or providing analyses simply as a service to the community (rather than as a way to develop the methods further at scale).
Should the application include Driving Biological Projects?
This RFA aims to support research in the science of Big Data that is driven by biological, biomedical, behavioral, social, environmental, or clinical science questions; information science and informatics questions; or computational questions. As such, the application should develop methods, approaches, software, tools, and related resources to address the data science questions in the context of biomedical research. To do this, you may need to produce data, you may be able to use someone else's data, or you may not need any new data at all. We do not require Driving Biological Projects or data production efforts in the application. Small biological projects may be included to provide data sets if suitable data sets are not easily available. These data would be used to test the approaches and software or validate the results. The approaches should be generalizable beyond the specific datasets used.
What is the preferred type of deliverables from the Center - products, services or both?
The Centers should produce enabling approaches, methods, software, tools, and resources that are shown to improve substantially our ability to manage and analyze biomedical data. They may also be useful to derive new biomedical knowledge about the specific research problem addressed, and may also be adopted or developed more for research problems in other biomedical areas. NIH does not plan to support these Centers indefinitely, so a Center should develop methods that move the field forward, not provide a service. For example, a Center could develop new approaches to data compression, but should not use the funding to do this compression as a service center for the research community. The applicant should explain how any software repositories would be supported after the BD2K Center funding ends.
We plan to propose to use a very large enterprise data warehouse that has extensive data of many types. We will develop methods and tools that enable the aggregation and integration of these data as well as novel analytic approaches. The applications and tools we build will be designed to be easily generalized to other diseases and institutions. Have we captured the intent of this RFA?
Yes; this would be responsive to this RFA, although this is not the only way that an application could be responsive.
Are the Centers supposed to actually do research, or just set up infrastructure for research funded elsewhere?
You should actually do the research and address an important problem. This should be Big Data science research in the biomedical context, and development of methods that will allow researchers to analyze these Big Data sets and use them in various ways. The research may be on some aspect of developing better infrastructure, so setting up some infrastructure as a way to test the systems may be appropriate.
Are Centers intended to work on general Big Data problems or on current problems faced by researchers?
Centers may focus on general problems of data management and analysis. They may also address specific problems, such as combining imaging modalities. Some appropriate data science questions are in particular scientific domains, although the reviewers will look at scientific merit and generalizability.
Is the development of new tools or software required?
The RFA aims to address roadblocks in using Big Data in the sense of developing the computational and informatics methods that are needed. This is not to simply validate Big Data tools, or to translate them to clinical applications, although those may be part of the activities of a Center. This is data science research in the context of biomedical questions, and thus likely will result in the development of new tools and software.
Would development of products that enable individualized medicine and other related topics be in the scope of this RFA?
With this RFA we are trying to address problems and bottlenecks associated with managing and analyzing Big Data sets. You should not be developing a product for a particular setting, but should be developing methods for dealing with these types of data. For example, how can electronic health records from many providers in various formats be combined in a useful way? More generally, possibly, how can we deal with various types of heterogeneous data?
Can we develop tools at the terabyte scale that could handle data at the petabyte scale?
Yes. Describe clearly in your application how the tools you develop on a smaller scale will be scalable to future Big Data sets.
How much research effort can be high-risk?
We are not looking for something incremental, but you should explain what the risks are and how you plan to manage them, and justify the potential high gains. A Center may propose a few research projects, some with higher risk than others.
What is the balance between research and development?
Both are important. The project should be driven by Big Data problems in the context of biomedical research. Some Centers will develop early-stage solutions; others will have more mature projects, but even if you are scaling up something that already exists, you should still be doing research.
What should the balance be between fundamental advances in data science versus showing biomedical applications?
Both are important. A Center needs to address an important data science challenge in managing or analyzing biomedical data, but should also demonstrate that the results will be important for enabling biomedical research.
How should we show in the application that the approaches are generalizable? You will need to decide how to demonstrate to the reviewers that your approaches are generalizable. Preliminary results of generalizability would be useful, but you can also provide evidence in other ways.
Do approaches also need to be generalizable beyond data types, in addition to generalizing beyond data sets?
With this RFA, we are trying to enable a wide range of biomedical research. Some approaches may apply across many data types, to specific classes of data types, or to particular data types. If an approach is useful only for a rarely-used data type, it might be less important than something that is useful across many data types or a widely used data type. You need to make the case for why your approach is important for moving biomedical research forward.
Should a Center focus on one disease or data type, or multiple ones?
Each Center should be an integrated unit, not a collection of disjoint projects. It may be fine if you have a good model system and make it clear in your application that your approaches could generalize to other diseases or data types. Or you may want to develop the approach with a few diseases or data types to demonstrate generalizability.
Must a Center be limited to one domain, or can it cross multiple domains?
Multiple domains are fine and may help with generalizability.
May I propose to develop methods for managing and analyzing data from large-scale simulations or modeling?
Yes, Big Data include data from observations, experiments, and large-scale simulations or modeling of biomedical phenomena.
May I propose methods for managing and analyzing data that simulate the functionality of a Big Data type that will become available in the near future?
Yes, new data types are constantly arising and data science research needs to keep up with them.
Can we use clinical data as a type of Big Data in our application?
NIH defines "biomedical research" broadly. Clinical data sets can be large and complex, so they may be included in your application.
Is there interest in approaches from other fields, such as computer science, statistics, finance, energy, and physics? Do the PIs need to be biologists?
One of the goals of this RFA is to bring in expertise from other fields to apply to biomedical questions. The PIs do not need to be biologists.
Are research areas on animals other than humans of interest?
Yes. Much biomedical research focuses on model organisms.
Is the integration of medical, environmental, genetic, and other data an appropriate topic?
Yes, data integration is an appropriate topic.
Will the BD2K program provide any software, tools, data, or hardware to the Centers?
No. The application should describe how all the resources will be obtained or developed that are needed to accomplish the research aims, and include the necessary funds in the budget.
May public health policy be a component of the Center?
Policy may affect technical considerations in some areas, which may be appropriate for research, but the Center program is not designed to support a large component of research in this area. Policy issues are important, and likely will be addressed by other programs at NIH, but should not be a focus of a Center.
How should we deal with training?
Include activities that support dissemination and training others to use the tools and methods that your Center develops. Also include training for students, postdocs, or mid-career researchers in the Big Data science related to the work of your Center. Both the use and adaptation of existing training programs, as well as the development of innovative approaches to training in the skills necessary to do research in the area of Big Data science or in the use of Big Data, are encouraged. The RFA does not specify a set proportion of the budget for these activities.
Top of page
The Proposed Center and Consortium
What organizational structure should I use for the Center?
Suggest an organizational structure that allows you to make good progress in addressing a data science problem for biomedical research. NIH is not recommending any particular structure. This RFA is not designed to support independent and uncoordinated work by PIs in a department, but is focused on addressing specific problems in biomedical data science.
Does a Center need to be in one physical place, or can it be a virtual Center in collaboration with other institutions?
The Center may be a collaboration among multiple institutions, but it is important to explain how decisions will be made, how meetings will take place, how information will be managed, etc.
Are multiple platforms supported across Centers, or are all Centers on the same Big Data platform?
Each Center may use whatever platforms it needs. It is not expected that all Consortium members will use the same platforms.
Can the Center bill on a "time and material" basis?Or based on deliverables?
This is cooperative agreement, not a contract. The awardee will receive funds in the same way as any NIH grant. The application should request a budget sufficient to accomplish the proposed research, and the awardee will be expected to carry out the proposed research and accomplish the proposed aims with those funds.
How should I deal with collaboration within the Consortium, since the awards have not been made yet?
As the RFA states, you should explain the approaches you would use to meet the program's objective of developing synergy through collaborations among the Centers once the awards are made and the Center Consortium has been formed. You could propose example projects with other Centers that make scientific sense based on your aims and activities. Appropriate travel funds should be allocated for these Consortium activities. Past collaborative activities would also be appropriate examples.
Will there be one Consortium or multiple Consortia of related Centers?
The purpose of the Consortium is to address issues that are above the level of any one Center. Centers with related projects may interact more strongly with each other, but there will be one Consortium.
Would a Center's activities that take place internally without collaboration with other Centers but contribute to the overall goals of the BD2K Initiative be considered a part of BD2K Consortium activity?
No. A Center's activities will have to contribute to the overall goals of the BD2K Initiative in order to be funded at all. There is a requirement for collaboration with other BD2K Centers because NIH is interested in fostering cooperation among the BD2K Centers, so that the whole Center program is greater than the sum of the parts. An application missing the component of collaboration within the BD2K Center Consortium will be considered non-responsive to the RFA.
How will the interdependencies among the Centers be managed? Will there be a central program management office?
The award mechanism is a U54 cooperative agreement, so NIH staff will be involved in the coordination among Centers. There will also be a Steering Committee composed of the PIs of the Center grants and NIH program staff, which will be responsible for developing policies for the Consortium on an on-going basis, post-award. The NIH program staff will be organized as a Project Team. The roles and responsibilities of the Principal Investigators, NIH staff, and the Consortium are described in the RFA.
Is including international groups or industry in Center activities encouraged?
Yes. We encourage collaborations, including with international groups and with industry. Any such activities as part of collaborative efforts with other Centers in the Consortium should be included in the BD2K Consortium Activities component.
Top of page
Who May Apply?
Who may be a PI?
Institutions determine who may be a PI for their applications. The PI should have experience running centers and a good track record. Review of the applications will include consideration of "centerness" and collaborative activities, so PI experience in these areas will be important.
May researchers from non-US institutions apply to this RFA?
PIs (including all PIs on multi-PI applications) must be at US institutions. However, applications may include researchers or subcontracts at foreign institutions. As with all applications, the foreign components will need to be well-justified and will receive careful scrutiny.
Can researchers working at national labs or at DOE Federally-Funded Research and Development Centers (FFRDCs) be PIs?
Staff at some national labs or FFRDCs may be PIs. They should check with their own agency and with the NIH grants policy office at firstname.lastname@example.org
Can only current centers apply to this RFA?
No. The RFA is not restricted to already-existing centers.
Is a supercomputing environment required?
Top of page
The Center Application
What should I include in the letter of intent (LOI)?
The RFA expects only this information, by October 20: application title, contact information for the PIs, names of other key personnel, participating institutions, and the number and title of this RFA. Letters of intent, including a brief description of the aims of the application, would allow program staff to assemble a good review group and provide advice about responsiveness prior to your writing the final application. This may be sent to NIH before October 20.
The instructions in the RFA say that each component of the application (Overall, Data Science Research, Training, Administration, and BD2K Center Consortium Activities) should include its own Face Page, Description, Table of Contents, Budgets, Biosketches, and Resources. This would result in redundancies. Can we provide the information just once?
The NIH Office of Extramural Research (OER) requires that each component of the application be mostly self-contained, since they want the format to be the same as the upcoming electronic forms. For each component, the biosketches and resources should be included for the people and resources that are part of that component, even if they are part of multiple components; thus, this information may need to be repeated more than once in the application. The table of contents for the Overall component should be the table of contents for the entire application, and each other component should have its own table of contents. The Overall component should have the biosketches of everyone in the application. The protections for human subjects and animals should be provided only in the Overall component, and the resource sharing plans should be provided only in the Data Science Research component. Budget pages do not count toward the page limits of components. The application as a whole needs to describe an integrated Center.
Must we use the sections of the application listed in the RFA, or may we modify them?
Follow the instructions in the RFA.
How should the PI's effort be distributed across all five components?
The RFA expects substantial participation by the PI (hence the requirement for at least 3.6 months effort per year) who is primarily responsible for managing the Center and for accomplishing the Center's goals. However, as noted in the RFA, NIH is not specifying any particular organizational structure for the Center and that includes the distribution of the PI's effort. It is up to the PI (and any co-PIs) to propose a Center structure, management plan, and distribution of effort of all participants that will accomplish the goals of the proposed Center.
May a PI ask for less than 3.6 months effort per year if there is another PI?
No. The Center will require a large amount of effort, and additional PIs are certainly appropriate. We ask for at least 2.4 months of effort from each additional PI, but this does not mean that the lead PI may have less than 3.6 months effort.
Could a researcher propose doing similar work in multiple applications?
Yes, although this should be disclosed in the applications, and any duplicate work would not be funded.
Where should we provide letters of support?
Letters of support should be provided for each component, as specified in PHS 398.Note that anybody who provides a letter of support is considered to be in conflict with the application and cannot review the application.Do not solicit letters of support from many people in the field, especially not from NIH Institute Advisory Council members.Letters of support are needed only from people who have agreed to provide materials or services.
May a CD accompany the application?
The RFA calls for paper applications. You may use a CD for the appendices.
Top of page
Software and IP
How will intellectual property and data rights be governed?
Broad dissemination and use of data and software are important goals of this program. Applicants are expected to propose methods of achieving this goal, consistent with NIH policies, as well as all IP rules that apply in their own institution. The RFA states that software should be developed in such a way that it can be handed off to a different project, can be used beyond a particular Center, must be freely available to researchers, and permits commercialization.
The guiding principles for the software sharing plans are: (1) software should be freely available to biomedical researchers and educators in the non-profit sector, such as education institutions, research institutions, and government laboratories; (2) terms should also permit the dissemination and commercialization of enhanced or customized versions of the software, or incorporation of the software or pieces of it into other software packages; (3) software should be transferable such that another individual or team can continue development in the event that the original investigators are unwilling or unable to do so, to preserve utility to the community; (4) terms of software availability should include the ability of researchers outside the Center and its collaborating projects to modify the source code and to share modifications with other colleagues as well as with the Center; and (5) given the long-term goals of this initiative to create software and tools for data science research that will serve as resources to biomedical researchers across the nation, applicants are asked to propose a plan to manage and disseminate the improvements or customizations of their tools and resources by others. This proposal may include a plan to incorporate the enhancements into the "official" core software, may involve the creation of an infrastructure for plug-ins, or may describe some other solution. Most people interpret that as open-source. It should be noted that the methods of software development and dissemination, the investigator's choice of software license, and source control will be subject to review.
For more information:
Many researchers may feel constrained by the IP policies of their institutions. What about analytic tools or algorithms that could have great utility but are patented? What is the extent of the openness for the source code or implementation details that is expected? Could exceptions to sharing code be made in certain situations?
The RFA expects a plan for software development that permits the software to be made available freely to researchers and educators in the non-profit sector, and which permits dissemination and commercialization of specific versions of the software (see RFA for specific language). Investigators' institutions may seek appropriate intellectual property protection (e.g., patent, copyright) for their works, but IP should not be exercised in a manner that would impede research and interfere with the wide dissemination of research results and software. NIH does not recommend specific licenses, but these should be chosen so that the dissemination goals of the RFA are met.
Must software be available through a gateway instead of through individual software packages?
As long as the programmatic goal of broad dissemination is achieved, a variety of approaches can be appropriate.
Top of page
What funding amount may I apply for?
An applicant may request up to $2 million direct costs per year (not including indirect costs (F&A) on subcontracts), for up to 4 years. With indirect costs, this is about $3 to $3.6 million per year.
Will NIH fund a fixed number of $2 million (direct cost) Centers, or could smaller proposed Centers also be funded?
Will a smaller application be disadvantaged or advantaged relative to a larger application because of its smaller scope but the potential to fund more Centers?
It is fine to propose a Center with a budget of less than $2 million with an application that has all five components and the scope asked for in the RFA. However, the proposed work should constitute an integrated set of Center activities rather than just a large R01.
Is there a budget limit for data storage?
No. There is no budget limit for any particular item. However, this RFA is not intended to support large-scale databases.
Will there be future BD2K RFAs?
Other BD2K RFAs will be released in the next year, such as for development of analysis methods. We do not know yet whether there will be another PI-initiated Center RFA; that depends on the availability of funds.
May we submit SBIRs and STTRs in this area?
This RFA does not support SBIRs or STTRs. However, future BD2K RFAs may be appropriate, and you may also apply to regular Institute or Center SBIR/STTR programs in this area as well as the NIH SBIR/STTR parent program announcements PA-13-234 and PA-13-235. An SBIR/STTR PI may contact potential Center applicants and tell them that they do related work, although NIH does not provide the names of potential applicants.
Top of page
Peer Review and Award Decisions
How will peer review be conducted?
The NIH Center for Scientific Review (CSR) will organize the reviews. The reviewers will give each application an overall score.
May we propose potential reviewers?
No. Your cover letter may describe the types of expertise that reviewers should have. Do not provide specific names, since they will be excluded from the review panel.
Will the review group include computer scientists, computer engineers, and people from other disciplines?
Yes, since the RFA aims to include approaches from many disciplines.
Which will be given more weight: new methods development, or improvement on existing methods in other fields?
Both will be considered important. Provide scientific justification for your proposed activities.
Will applications be reviewed on the clinical significance of problems they address, or purely on methods and generalizability?
Once again, this is not an either/or answer. If you propose to address a clinically relevant Big Data research problem, then the reviewers will evaluate its clinical significance. Generalizability of the methods will be evaluated for all applications.
How will a Center that is very focused, in which the team consists only of biostatisticians, be viewed by the review panel?
The most important thing about building the team for your Center is that the members need to be able to address the question that you are studying. Make the case in your application that this is a critical problem and you are proposing a good way to approach it.
Will awards be based only on the top scores, or will Institutes and Centers be able to reach more based on their priorities?
We will not pay strictly by score order, as we will put together a balanced set. BD2K has funds from all ICs but is managed separately. ICs will be able to fund Centers beyond those chosen by BD2K.
The "Scientific Objectives" section describes four areas. Will a Center be chosen for each area?
We will strive for programmatic balance when we choose the Centers, so we expect to cover all four areas. We do not know what the applicants will propose, so you need to choose a project based on your skills and experience.
Top of page
Posted: October 17, 2013