Purdue CS440: Large-scale Data Analytics
|
Course Description"Big data" has been a buzzword for a long time. Many disruptive techniques have been developed to address various aspects of big data. This course will cover the key concepts, design principles, and systems to analyze large-scale data in order to extract novel and transformative insights. Tentative topics include database fundamentals, big data storage (e.g., HDFS), big data computing frameworks (e.g., Hadoop and Spark), data warehouses, data lakes, graph analytics (e.g., Spark Graph), data streaming (e.g., Spark Streaming), large-scale machine learning (e.g., Spark MLlib), vector databases, and cloud-native data analytics. |
Instructor
|
Teaching Assistants
|
Logistics
|
Labs and PSOs
Labs and PSOs will start from the 3rd week.
|
Online communications
|
Textbooks (Optional)
Note that textbooks are optional and the lectures slides are self-contained.
|
Grading
|
Academic Integrity and More
|
Schedule |
Lecture |
Topic |
| Lec 1 (01/12) | Course Introduction |
| Lec 2 (01/14) | Relational DB |
| Lec 3 (01/19) | No class due to MLK Day |
| Lec 4 (01/21) | No class due to conference trip |
| Lec 5 (01/26) | SQL |
| Lec 6 (01/28) | SQL 2 |
| Lec 7 (02/02) | Database Storage |
| Lec 8 (02/04) | Index |
| Lec 9 (02/09) | Query Processing |
| Lec 10 (02/11) | Query Processing 2 |
| Lec 11 (02/16) | Transaction |
| Lec 12 (02/18) | Concurrency Control |
| Lec 13 (02/23) | Crash Recovery |
| Lec 14 (02/25) | Crash Recovery 2 |
| Lec 15 (03/02) | Distributed Databases |
| Lec 16 (03/04) | Midterm Exam (In-class) |
| Lec 17 (03/09) | Hadoop |
| Lec 18 (03/11) | SQL-on-Hadoop |
| Lec 19 (03/16) | No class due to Spring break |
| Lec 20 (03/18) | No class due to Spring break |
| Lec 21 (03/23) | Big Data Storage |
| Lec 22 (03/25) | Big Data Storage 2 |
| Lec 23 (03/30) | Spark Core |
| Lec 24 (04/01) | Spark SQL |
| Lec 25 (04/06) | Spark ML |
| Lec 26 (04/08) | Spark Streaming |
| Lec 27 (04/13) | Spark Graph |
| Lec 28 (04/15) | Vector Data Analytics |
| Lec 29 (04/20) | Vector Data Analytics 2 |
| Lec 30 (04/22) | Cloud-Native Data Analytics |
| Lec 31 (04/27) | Cloud-Native Data Analytics 2 |
| Lec 32 (04/29) | Review |