FOUNT Courselets

Courselet Title: Data Acquisition: Generic Remote Data

Author: Yash Kurkure
Contact Email: ykurku2@uic.edu

Description:

In this courselet, the students will learn to acquire data from the World Wide Web space. Students will learn to use Python tools to extract data from generic websites and collect from multiple sources in an automated and respectful manner, taking the students from the point of acquiring data to a simple Python application for analysis.

Core Learning Outcomes:
1. Cron Jobs
2. Web Scraping
3. HTML, BeautifulSoup, Matplotlib

Intended Learners:
This courselet is intended for individual study; prerequisite knowledge of Python programming, Bash Terminal and SSH/SCP is required.

Resource Requirements
– 1 Bare Metal Instance or VM – any Linux distribution
– 1 Floating IP

The content is organized into 4 sections:
1. Cron Jobs
2. Web Scraping
3. Scheduled Web Scraping
4. Advanced Web Scraping
All of them can be run using a single Chameleon instance.

Link to Artifact: https://www.chameleoncloud.org/experiment/share/382e4bcc-3420-4d69-ac1b-4a0d64fdfd5b

Apply for a FOUNT badge to add your courselet to the table!