Apache Spark Streaming - Ryan Lynch's Hub

# Overview A [[Data Stream]] framework built on [[Apache Spark]]. # Key Considerations ## Creating Data Streams ### Via Structured Streams A structured stream sees data from the input source is treated as if it is an unbounded table. It emphasizes [[Fault Tolerance]] by using checkpointing and a [[Write-ahead Log (WAL)]] to track stream progress. There is also an [[Exactly Once Delivery]] guarantee. A stream is declared on the input source: ![[2024-10-17_{{filename}}-4.png]] Then, the stream can be persisted to storage: ![[2024-10-17_{{filename}}-5.png]] - Trigger - determines when the data is processed - OutputMode - append (append new rows) vs. complete (overwrite the table) - Checkpoint - store stream state (cannot be shared between streams) # Pros # Cons # Use Cases # Related Topics