ruby 35 lines · 8 steps

Building a word-frequency analyzer in Ruby

A module that tokenizes text, counts words, and ranks them by frequency using Ruby's enumerable toolkit.

Explained by highlit
1module TextAnalysis
2 module_function
3 
4 WORD_PATTERN = /[\p{Alpha}']+/
5 
6 STOP_WORDS = %w[the a an and or but of to in on at for is are was were be].to_set
7 
8 def word_frequencies(text, exclude_stop_words: true, limit: nil)
9 words = text
10 .downcase
11 .scan(WORD_PATTERN)
12 .map { |word| word.delete_prefix("'").delete_suffix("'") }
13 .reject(&:empty?)
14 
15 words.reject! { |word| STOP_WORDS.include?(word) } if exclude_stop_words
16 
17 counts = words.tally
18 
19 ranked = counts.sort_by { |word, count| [-count, word] }
20 ranked = ranked.first(limit) if limit
21 
22 ranked.to_h
23 end
24 
25 def top_word(text, **options)
26 word_frequencies(text, **options).max_by { |_word, count| count }
27 end
28 
29 def unique_ratio(text)
30 words = text.downcase.scan(WORD_PATTERN)
31 return 0.0 if words.empty?
32 
33 words.tally.size.fdiv(words.size).round(3)
34 end
35end
01 / 01
STEP 01

Walkthrough

Space play step click any line
Three takeaways
  1. 1Chaining string and enumerable methods turns raw text into clean tokens in one expressive pipeline.
  2. 2Sorting by a tuple like [-count, word] gives descending counts with a stable alphabetical tiebreak.
  3. 3module_function lets a module expose stateless helpers callable directly without instantiation.

Related explainers

Share this explainer

Here's the card — post it anywhere.

Building a word-frequency analyzer in Ruby — share card
Made with highlit — turn any snippet into a walkthrough like this in about a minute.
Explain your code