python
28 lines · 5 steps
Streaming TSV records with Python generators
A lazy generator pipeline reads optionally-gzipped TSV files line by line and counts matching records without loading the whole file.
Explained by
highlit
1import gzip
2from pathlib import Path
3from typing import Iterator
4
5
6def read_lines(path: str | Path, *, encoding: str = "utf-8") -> Iterator[str]:
7 path = Path(path)
8 opener = gzip.open if path.suffix == ".gz" else open
9 with opener(path, mode="rt", encoding=encoding) as handle:
10 for line in handle:
11 yield line.rstrip("\n")
12
13
14def iter_records(path: str | Path) -> Iterator[dict[str, str]]:
15 lines = read_lines(path)
16 header = next(lines).split("\t")
17 for line in lines:
18 if not line:
19 continue
20 yield dict(zip(header, line.split("\t")))
21
22
23def count_errors(path: str | Path) -> int:
24 return sum(
25 1
26 for record in iter_records(path)
27 if record.get("level") == "ERROR"
28 )
01 / 01
STEP 01
‹ swipe to step through ›
Walkthrough
Space play
←→ step
click any line
Three takeaways
- 1Generators let you process arbitrarily large files with constant memory by yielding one item at a time.
- 2Selecting the opener by file suffix transparently handles both plain and gzipped inputs through one code path.
- 3Layering small generators into a pipeline keeps each stage focused and composable.
Related explainers
python
import argparse import sys from pathlib import Path
Building a subcommand CLI with argparse
cli
argparse
subcommands
Intermediate
6 steps
python
from collections.abc import Mapping from typing import Any, Iterator
Flattening nested config into dotted keys
recursion
generators
tree-traversal
Intermediate
7 steps
python
import csv import io from datetime import datetime
Streaming a CSV export in Flask
streaming
generators
csv
Intermediate
9 steps
python
import time from collections import defaultdict from threading import Lock
Sliding-window login rate limiting in Flask
rate-limiting
sliding-window
thread-safety
Intermediate
7 steps
python
from django.conf import settings from django.contrib.auth import get_user_model from django.core.mail import EmailMultiAlternatives from django.db.models.signals import post_save
Sending a welcome email with Django signals
signals
email
user-activation
Intermediate
8 steps
python
import csv import io from datetime import date
Streaming a CSV export in FastAPI
streaming
async-generators
csv
Advanced
8 steps
Share this explainer
Here's the card — post it anywhere.
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code
Embed this explainer
Drop the interactive walkthrough into a blog or docs. Views never cost a credit.
<iframe src="https://highlit.co/explainers/streaming-tsv-records-with-python-generators-explained-python-fcd2/embed?autoplay=1" width="100%" height="520" loading="lazy" style="border:0"></iframe>
Autoplay is on by default — add ?autoplay=0 to start paused.