Citi Bike raw data -> HDFS + MinIO -> Spark clean/normalize -> MySQL -> Kafka realtime -> Hadoop MapReduce -> MySQL report tables -> Streamlit GUI + Superset ...
SQL: Open MySQL Workbench Create a database: CREATE DATABASE churndb; Import the raw CSV using Table Data Import Wizard Run churnsql.sql top to bottom Python: Install dependencies: pip install pandas ...