# Overview
A [[Data Stream]] framework built on [[Apache Spark]].
# Key Considerations
## Creating Data Streams
### Via Structured Streams
A structured stream sees data from the input source is treated as if it is an unbounded table. It emphasizes [[Fault Tolerance]] by using checkpointing and a [[Write-ahead Log (WAL)]] to track stream progress. There is also an [[Exactly Once Delivery]] guarantee.
A stream is declared on the input source:
![[2024-10-17_{{filename}}-4.png]]
Then, the stream can be persisted to storage:
![[2024-10-17_{{filename}}-5.png]]
- Trigger - determines when the data is processed
- OutputMode - append (append new rows) vs. complete (overwrite the table)
- Checkpoint - store stream state (cannot be shared between streams)
# Pros
# Cons
# Use Cases
# Related Topics