python 28 lines · 5 steps

Streaming TSV records with Python generators

A lazy generator pipeline reads optionally-gzipped TSV files line by line and counts matching records without loading the whole file.

Explained by highlit

Intermediate generators lazy-evaluation file-io data-parsing gzip

1import gzip
2from pathlib import Path
3from typing import Iterator
4 
5 
6def read_lines(path: str | Path, *, encoding: str = "utf-8") -> Iterator[str]:
7    path = Path(path)
8    opener = gzip.open if path.suffix == ".gz" else open
9    with opener(path, mode="rt", encoding=encoding) as handle:
10        for line in handle:
11            yield line.rstrip("\n")
12 
13 
14def iter_records(path: str | Path) -> Iterator[dict[str, str]]:
15    lines = read_lines(path)
16    header = next(lines).split("\t")
17    for line in lines:
18        if not line:
19            continue
20        yield dict(zip(header, line.split("\t")))
21 
22 
23def count_errors(path: str | Path) -> int:
24    return sum(
25        1
26        for record in iter_records(path)
27        if record.get("level") == "ERROR"
28    )

01 / 01

STEP 01

Walkthrough

Space play ←→ step click any line

Three takeaways

1Generators let you process arbitrarily large files with constant memory by yielding one item at a time.
2Selecting the opener by file suffix transparently handles both plain and gzipped inputs through one code path.
3Layering small generators into a pipeline keeps each stage focused and composable.

Related explainers

python

import argparse
import sys
from pathlib import Path

Building a subcommand CLI with argparse

cli argparse subcommands

Intermediate 6 steps

python

from collections.abc import Mapping
from typing import Any, Iterator

Flattening nested config into dotted keys

recursion generators tree-traversal

Intermediate 7 steps

python

import csv
import io
from datetime import datetime

Streaming a CSV export in Flask

streaming generators csv

Intermediate 9 steps

python

import time
from collections import defaultdict
from threading import Lock

Sliding-window login rate limiting in Flask

rate-limiting sliding-window thread-safety

Intermediate 7 steps

python

from django.conf import settings
from django.contrib.auth import get_user_model
from django.core.mail import EmailMultiAlternatives
from django.db.models.signals import post_save

Sending a welcome email with Django signals

signals email user-activation

Intermediate 8 steps

python

import csv
import io
from datetime import date

Streaming a CSV export in FastAPI

streaming async-generators csv

Advanced 8 steps

Share this explainer

Here's the card — post it anywhere.

Streaming TSV records with Python generators — share card

Made with highlit — turn any snippet into a walkthrough like this in about a minute.

Explain your code

Streaming TSV records with Python generators

Walkthrough

Related explainers

Building a subcommand CLI with argparse

Flattening nested config into dotted keys

Streaming a CSV export in Flask

Sliding-window login rate limiting in Flask

Sending a welcome email with Django signals

Streaming a CSV export in FastAPI

Share this explainer

Embed this explainer