Courselet Title: Data Acquisition: Generic Remote Data
Author: Yash Kurkure
Contact Email: ykurku2@uic.edu
In this courselet, the students will learn to acquire data from the World Wide Web space. Students will learn to use Python tools to extract data from generic websites and collect from multiple sources in an automated and respectful manner, taking the students from the point of acquiring data to a simple Python application for analysis.
Core Learning Outcomes:
1. Cron Jobs
2. Web Scraping
3. HTML, BeautifulSoup, Matplotlib
Intended Learners:
This courselet is intended for individual study; prerequisite knowledge of Python programming, Bash Terminal and SSH/SCP is required.
Resource Requirements
– 1 Bare Metal Instance or VM – any Linux distribution
– 1 Floating IP
The content is organized into 4 sections:
1. Cron Jobs
2. Web Scraping
3. Scheduled Web Scraping
4. Advanced Web Scraping
All of them can be run using a single Chameleon instance.
Link to Artifact: https://www.chameleoncloud.org/experiment/share/382e4bcc-3420-4d69-ac1b-4a0d64fdfd5b
Apply for a FOUNT badge to add your courselet to the table!