Clowder awarded $5 Million from NSF
The National Science Foundation (NSF) has awarded $5 million to bring together the Clowder community. Clowder, an open source data management tool based on active curation, was developed at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, in conjunction with the Coordinated Science Lab (CSL) and Civil and Environmental Engineering (CEE) department.
Clowder was designed to address the preservation, sharing, navigating, and reuse of large and diverse collections of data that is now essential to scientific discoveries. These data navigation needs are also important when addressing the growing number of research areas where data and tools must span multiple domains. “Clowder was built from common needs across different research communities,” said McHenry. To support these needs effectively, new methods are required that simplify and reduce the amount of effort needed by researchers to find and utilize data, support community accepted data practices, and bring together the breadth of standards, tools, and resources utilized by a community.
But there’s also a need to make this data more accessible and usable. “Often times, data can be difficult to curate and or share, so now, we’re able to capture data that would be lost otherwise,” said McHenry. Metadata plays a key role in order for data to be usable. However, getting that metadata can be a manual and tedious process, but having machine learning-based tools analyze data and extract metadata makes the curation process more accessible now with Clowder to automate a portion of that process.
Additionally, Clowder as a science gateway makes it easier for users to access advanced HPC/Cloud resources and analysis tools. With Clowder’s auto curation feature, researchers can upload data into a Dropbox-like interface, and trigger complex analysis tools operating in the background.
Dr. Praveen Kumar will lead an effort to work with nine Critical Zone Observatories (CZO) across the United States – where researchers study the region of the environment from the top of the plant canopy to the bedrock beneath, known as the critical zone – to help organize their data and demonstrate the applicability of the system. “We want to use this system to do scientific investigation using this cross-observatory data, and the purpose is to make sure that the systems put in place are designed to support valuable science investigation rather than being arbitrarily stacked together,” Kumar said. “The scientific investigation may create requirements for the organization of the system, the architecture of the system.”
Clowder has had and will continue to have a major impact on materials and semiconductor research areas, since it is an integral part of the 4CeeD system. Material scientists and semiconductor fabrication researchers can use 4CeeD to capture, curate, coordinate, correlate, and distribute their data from scientific instruments, such as microscopes, to private cloud infrastructure. The cloud-based infrastructure conducts this work in a trusted and real-time manner, using the modified Clowder data management system for instrument data management. 4Ceed is funded by the National Science Foundation and led by Dr. Klara Nahrstedt, Director of the Coordinated Science Laboratory and the Ralph and Catherine Fisher Professor of Computer Science. “Discovering new materials can take decades, in part due to the time it takes to conduct research, thanks to the loss of knowledge that occurs when vital information is tossed out or is inaccessible,” Nahrstedt said. “4CeeD enables researchers to capture, curate, analyze, and correlate instrument data during experiments in real-time, search for experimental data with specific instrument parameters and receive insights into their own work, a task that would not be possible without the power of the Clowder system.”
Clowder is essential to enabling 4CeeD in two ways: First, the data management system, maintained by Engineering IT Shared Services, provides 4CeeD with reliable cloud computer and data services, which current and future scientists need to advance their work. Second, Clowder will help provide 4CeeD access to a larger user base, particularly to researchers in the Materials Research Lab and Micro and Nano Technology Lab, while offering advanced data management techniques to students, who are the next generation of scientists working with complex data sets.
What does it mean to build a community around Clowder?
Clowder aims to continue working towards sustainability in order to become a true open source project, that is decentralized and robust. Until this funding from the NSF, Clowder has never been funded solely as Clowder, but has been built up by meeting a common need to effectively share and analyze data across numerous projects. This grant is a step closer to the sustainability of software, and is an investment to make it sustainable across organizations.
“All of us are looking forward to building this roadmap for future software needs from the research community by bringing together these partners,” said McHenry.
“Over the past six years, Clowder has benefitted from contributions from developers and stakeholders across many projects of different size and scope. This new effort will allow the Clowder community to continue to grow beyond individual projects to develop a flexible framework for data management across many scientific disciplines,” said Marini, Software Architect for Clowder.
The project enhances Clowder's core systems for the benefit of a larger group of users. It increases the level of interoperability with community accepted resources and tools, hardens the core software, and distributes core software development, while continuing to expand usage. Governance mechanisms and a business model are established to make Clowder sustainable, creating an appropriate governance structure to ensure that the software continues to be available, supportable, and usable. The effort engages a number of stakeholders, taking data from diverse but converging scientific domains already using the Clowder framework, to address broad interoperability and cross domain data sharing. The overall effort will transition the grassroots Clowder user community and Clowder's other stakeholders (such as current and potential developers) into a larger organized community, with a sustainable software resource supporting convergent research data needs.
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation's science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune 50® for more than 30 years by bringing industry, researchers, and students together to solve grand challenges at rapid speed and scale.
ABOUT THE COORDINATED SCIENCE LABORATORY
The Coordinated Science Lab (CSL) is a major scientific research laboratory in the University of Illinois at Urbana-Champaign’s prestigious College of Engineering. With deep roots in information technology, CSL has invented and deployed many landmark innovations, such as the electric vacuum gyroscope, the first computer-assisted instructional program, and the plasma TV. Today, applications include autonomy, cybersecurity and resiliency, energy systems, health care data analytics, and AI, robotics, smart cities, and more.
ABOUT THE DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING
Established in 1867, the Department of Civil and Environmental Engineering (CEE) at the University of Illinois at Urbana-Champaign (Illinois) has been a leader in research and education for more than 150 years. With top faculty and a broad range of facilities, the department consistently ranks among the most elite programs in the U.S. and worldwide for both undergraduate and graduate studies. CEE at Illinois continues its long tradition of excellence by advancing research and preparing students to lead in a world where increasing population, demand, and urbanization create challenges that need innovative civil and environmental engineering solutions.