python 30 lines · 7 steps

Batching an iterable for bulk indexing

A lazy chunking generator feeds fixed-size batches into a bulk index call while tracking failures.

Explained by highlit
1from itertools import islice
2from typing import Iterable, Iterator, TypeVar
3 
4T = TypeVar("T")
5 
6 
7def chunked(items: Iterable[T], size: int) -> Iterator[list[T]]:
8 if size < 1:
9 raise ValueError("size must be at least 1")
10 iterator = iter(items)
11 while batch := list(islice(iterator, size)):
12 yield batch
13 
14 
15def bulk_index_documents(documents: Iterable[dict], client, batch_size: int = 500) -> int:
16 indexed = 0
17 for batch in chunked(documents, batch_size):
18 actions = [
19 {"index": {"_id": doc["id"]}, "_source": doc}
20 for doc in batch
21 ]
22 response = client.bulk(operations=actions)
23 if response.get("errors"):
24 failures = [
25 item for item in response["items"]
26 if item["index"]["status"] >= 400
27 ]
28 raise RuntimeError(f"{len(failures)} documents failed to index")
29 indexed += len(batch)
30 return indexed
01 / 01
STEP 01

Walkthrough

Space play step click any line
Three takeaways
  1. 1Wrapping any iterable in a generator lets you batch streams without loading everything into memory.
  2. 2The walrus operator turns an islice-into-list loop into a clean batch-until-empty pattern.
  3. 3Bulk APIs need explicit response inspection, since a 200 transport can still contain per-item failures.

Related explainers

Share this explainer

Here's the card — post it anywhere.

Batching an iterable for bulk indexing — share card
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code