Skip to main content
Version: Next

Big Data

This section groups the big data content by capability first and by technology second.

  • Fundamentals covers architecture, ingestion, transformation, modeling, serving, and tool selection.
  • Storage and formats covers HDFS, data lake architecture, Delta Lake, Avro, and Parquet.
  • Processing engines covers Spark, Flink, and Beam.
  • Streaming and messaging covers Kafka and Zookeeper.
  • Lakehouse covers Apache Iceberg.
  • Query and serving covers Presto and Redis.
  • Governance covers data governance concepts and tools.
  • Legacy keeps older Hadoop-era notes separate from the main learning path.