Course Description

Big data has become an essential part of our digital world in the last decade. Governments, e-commerce websites, even short video platforms, are now relying on big data technologies to gain business insight and design their strategies. Different from the traditional data analytics industry, the infrastructure of big data is mostly built on top of cheap commodity PCs and open source software.  This trend has successfully lowered the deployment as well as operational cost of such big data platforms. On the other hand, it raises new challenges to data engineers and scientists when building their own system with such infrastructure.

This course aims to bridge the gap between the big data practice and the skills of undergraduate students in data infrastructure. It will provide an overview of big data infrastructure, which enables the student to build their own system based on the data characteristics and processing demands. It will get the students involved in the whole process of big data system building, i.e., the design, planning and implementation. It will also cover additional tools and techniques crucial to the success of big data, including data visualization, and monitoring.
 
Students are expected to learn skills of 1) software development infrastructure in mainstream data-driven IT products; 2) data crawling and extraction frameworks for web-based data collection; 3) data storage architecture for highly variant data in big data systems; 4) massive data processing frameworks; 5) real-time and streaming data processing frameworks; 6) data visualization tools for big data analytics; 7) orchestration of the open source systems with high-level data interface.
 

Learning Objectives

Upon completion of the course, students will learn:


System development infrastructure
✓         Source code management with Git and GitHub 
✓         Task Management with ClickUp
✓         Free document system, Confluence
Data crawling and extraction framework
✓         Extract information from web pages with Scrapy
Data storage system
✓         Choose the right storage architecture based on data characteristics
✓         Use data store and document store for various types of data
Massive data processing frameworks
✓         Install and deploy Hadoop and Spark
✓         Program big data processing logics with Hadoop and Spark
Data interface between modules
✓         Process JSon files
✓         Adopts GraphQL as the data interface
Other open source big data tools
✓         Visualise data results with D3.js
✓         Monitors online data with Prometheus

School of Computing and Information Systems
School Term
AY2022/23 TERM 2
Course Code
IS 459

SUBSCRIBE TO OUR NEWSLETTER

Subscribe to our free monthly newsletter for the latest news, case studies and competitions

Newsletter checkboxes