python
44 lines · 8 steps
Aggregating CSV sales by category in Python
A dataclass and defaultdict turn a raw sales CSV into per-category totals ranked by revenue.
Explained by
highlit
1import csv
2from collections import defaultdict
3from dataclasses import dataclass, field
4from decimal import Decimal, InvalidOperation
5from pathlib import Path
6
7
8@dataclass
9def _noop():
10 pass
11
12
13@dataclass
14class CategorySummary:
15 total: Decimal = field(default_factory=lambda: Decimal("0"))
16 units: int = 0
17 orders: int = 0
18
19 @property
20 def average_order_value(self) -> Decimal:
21 if self.orders == 0:
22 return Decimal("0")
23 return (self.total / self.orders).quantize(Decimal("0.01"))
24
25
26def aggregate_sales_by_category(path: str | Path) -> dict[str, CategorySummary]:
27 summaries: dict[str, CategorySummary] = defaultdict(CategorySummary)
28
29 with open(path, newline="", encoding="utf-8") as handle:
30 reader = csv.DictReader(handle)
31 for line, row in enumerate(reader, start=2):
32 category = (row.get("category") or "uncategorized").strip().lower()
33 try:
34 quantity = int(row["quantity"])
35 unit_price = Decimal(row["unit_price"])
36 except (KeyError, ValueError, InvalidOperation) as exc:
37 raise ValueError(f"malformed sales row at line {line}") from exc
38
39 summary = summaries[category]
40 summary.total += unit_price * quantity
41 summary.units += quantity
42 summary.orders += 1
43
44 return dict(sorted(summaries.items(), key=lambda item: item[1].total, reverse=True))
01 / 01
STEP 01
‹ swipe to step through ›
Walkthrough
Space play
←→ step
click any line
Three takeaways
- 1A defaultdict of dataclasses gives you clean per-key accumulation without existence checks.
- 2Using Decimal instead of float keeps money arithmetic exact and rounding explicit.
- 3Tracking the row number while parsing lets errors point back to the offending line.
Related explainers
python
from flask import Blueprint, jsonify, request, abort from .models import Article, db from .schemas import article_schema, articles_schema
Building a REST articles API with Flask Blueprints
rest-api
blueprints
serialization
Intermediate
7 steps
python
from django.core.cache import cache from rest_framework.throttling import SimpleRateThrottle
A login rate throttle in Django REST Framework
rate-limiting
caching
throttling
Intermediate
8 steps
python
import stripe from fastapi import APIRouter, Request, Header, HTTPException from app.config import settings
Handling Stripe webhooks in FastAPI
webhooks
signature-verification
event-routing
Intermediate
7 steps
java
public Map<Long, List<Order>> ordersByCustomer(List<Order> orders) { return orders.stream() .collect(Collectors.groupingBy(Order::getCustomerId)); }
Grouping streams with Java Collectors
streams
grouping
collectors
Intermediate
5 steps
python
from operator import itemgetter def sort_employees(employees):
Multi-key sorting patterns in Python
sorting
tuple-keys
itemgetter
Intermediate
5 steps
python
from itertools import islice from typing import Iterable, Iterator, TypeVar T = TypeVar("T")
Batching an iterable for bulk indexing
generators
batching
lazy-evaluation
Intermediate
7 steps
Share this explainer
Here's the card — post it anywhere.
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code
Embed this explainer
Drop the interactive walkthrough into a blog or docs. Views never cost a credit.
<iframe src="https://highlit.co/explainers/aggregating-csv-sales-by-category-in-python-explained-python-c9e1/embed?autoplay=1" width="100%" height="520" loading="lazy" style="border:0"></iframe>
Autoplay is on by default — add ?autoplay=0 to start paused.