ruby
35 lines · 8 steps
Building a word-frequency analyzer in Ruby
A module that tokenizes text, counts words, and ranks them by frequency using Ruby's enumerable toolkit.
Explained by
highlit
1module TextAnalysis
2 module_function
3
4 WORD_PATTERN = /[\p{Alpha}']+/
5
6 STOP_WORDS = %w[the a an and or but of to in on at for is are was were be].to_set
7
8 def word_frequencies(text, exclude_stop_words: true, limit: nil)
9 words = text
10 .downcase
11 .scan(WORD_PATTERN)
12 .map { |word| word.delete_prefix("'").delete_suffix("'") }
13 .reject(&:empty?)
14
15 words.reject! { |word| STOP_WORDS.include?(word) } if exclude_stop_words
16
17 counts = words.tally
18
19 ranked = counts.sort_by { |word, count| [-count, word] }
20 ranked = ranked.first(limit) if limit
21
22 ranked.to_h
23 end
24
25 def top_word(text, **options)
26 word_frequencies(text, **options).max_by { |_word, count| count }
27 end
28
29 def unique_ratio(text)
30 words = text.downcase.scan(WORD_PATTERN)
31 return 0.0 if words.empty?
32
33 words.tally.size.fdiv(words.size).round(3)
34 end
35end
01 / 01
STEP 01
‹ swipe to step through ›
Walkthrough
Space play
←→ step
click any line
Three takeaways
- 1Chaining string and enumerable methods turns raw text into clean tokens in one expressive pipeline.
- 2Sorting by a tuple like [-count, word] gives descending counts with a stable alphabetical tiebreak.
- 3module_function lets a module expose stateless helpers callable directly without instantiation.
Related explainers
ruby
require "csv" class SalesReport def initialize(path)
Aggregating CSV sales data in Ruby
data-aggregation
memoization
group_by
Intermediate
6 steps
ruby
module DurationFormatter UNITS = [ ['week', 604_800], ['day', 86_400],
Turning seconds into human-readable durations in Ruby
greedy-decomposition
modular-arithmetic
formatting
Intermediate
7 steps
ruby
class Comment < ApplicationRecord belongs_to :post belongs_to :author, class_name: "User"
Live-updating comments with Turbo in Rails
turbo-streams
callbacks
associations
Intermediate
8 steps
ruby
require 'json' require 'set' class SensitiveScrubber
Recursively scrubbing secrets from JSON
recursion
data-masking
pattern-matching
Intermediate
7 steps
java
@Target({ElementType.FIELD, ElementType.PARAMETER}) @Retention(RetentionPolicy.RUNTIME) @Constraint(validatedBy = StrongPasswordValidator.class) @Documented
Building a custom @StrongPassword validator in Spring
bean-validation
annotations
regex
Intermediate
7 steps
ruby
class ReportBatcher BATCH_SIZE = 500 def initialize(account)
Batching monthly email summaries in Rails
batching
service-object
background-jobs
Intermediate
7 steps
Share this explainer
Here's the card — post it anywhere.
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code
Embed this explainer
Drop the interactive walkthrough into a blog or docs. Views never cost a credit.
<iframe src="https://highlit.co/explainers/building-a-word-frequency-analyzer-in-ruby-explained-ruby-54cf/embed?autoplay=1" width="100%" height="520" loading="lazy" style="border:0"></iframe>
Autoplay is on by default — add ?autoplay=0 to start paused.