The ability to gain insight from data is the literacy of the 21st century – however obtaining that insight is the result of a complex workflow of actions ranging from acquiring the data, moving it around for processing, to analyzing, presenting, and ultimately archiving for further reference. Implementing this workflow correctly and efficiently requires a combination of skills in data systems where the constantly evolving hardware architectures (accelerators and storage technologies), networking (Smart NICs and programmable networks), or the improvement of the price-performance ratio of edge devices, create unprecedented opportunities but also significant challenges – and the need for data scientists who can solve them. Training such data scientists is no easy task and requires both developing deep familiarity with data-related concepts and hands-on training in a broad range of innovative technologies including topics ranging from data acquisition, computer networking, storage systems, machine learning and data analytics, and visualization.
With a $1M grant from the National Science Foundation (NSF), University of Chicago led proposal with partners in New York University (NYU), Northern Illinois University (NIU), Stony Brook University, University of California in San Diego (UCSD), and University of Illinois in Chicago (UIC), the FOUNT project will develop a broad range of digital educational courselts – bundles of digital materials that can be integrated into existing curricula or used to develop new ones — that emphasize scaffolded, hands-on training using the Chameleon and FABRIC testbeds. Called FOUNT, the project aims to create a repository of training materials to establish a critical mass of educational content in data systems that can be improved by others.
The courselets are based on the concept of literate programming and enable students to explore concepts and methods taught in the classroom via hands-on exercises – such exercises can then be modified and extended in various ways to deepen the students’ understanding and allow them to explore new ideas.
Part of the project’s aim is to establish a content hub, integrated with open clouds and testbeds, where students can find relevant courselets so that they can deploy them “with one click”, making exploration-based learning readily available. Exploring this Integration of courselts with testbeds in a way that enables immediate execution is a new, powerful idea that the project brings to teaching.
“Repositories of training materials have been built before” says Keahey, the project PI, “but their shortcoming is that when they teach innovative concepts it can be difficult to find that very new platform that they have to be taught on”. In those cases, the leverage of existing materials can in practice be limited by platform availability – or at least require significant investment in porting the educational content to an available platform. In contrast, a repository of training materials that comes with a testbed – and can be extended to other testbeds – presents a platform that is not only easier to use but that stands a better chance of addressing the needs of students and educators who don’t have ready access to expensive hardware.
Ultimately, the project aims to translate its experience of creating, teaching, supporting, and improving courselets into a set of best practices, and build a vibrant community of students, teachers – and contributors across all types of academic institutions, and thereby create a model of digital sharing that will improve availability of digital educational content to all members of our society.