Version: 1.0

Athena

Is serverless to query S3 using SQL
Uses SQL language to query files (Build on top of Apache Presto)
It's schema on read
Accepts format:
- CSV
- Json
- Parquet (columnar, splittable)
- ORC (columnar, splittable)
- Avro (columnar, splittable)
- XML
Princing: fixe amount per TB of data scanned
Structured, semi-structured an unstructured data
Uses cases: BI, analytics, reporting, query logs, Ad-hoc queries
Athena detect automatically the tables from Glue, and it creates queryable table

Performance

Use columnar data for cost-savings (less scan)
- Apache Parquet or ORC is recommended
- Use Glue to convert data to Parquet or ORC
Compress data for smaller retrievals (bzip2, gzip, lz4, snappy, zlip, zstd, ...)
Partition datasets in S3 for easy querying on virtual columns
Use larger files ( > 128 MB) to minimize overhead