Duplicate Word Finder
Spot over-used words in essays, posts and emails — colour-rank highlighting, frequency table and lexical-diversity score. Stopwords filter included. Free and 100% browser-side.
What is the Duplicate Word Finder?
Duplicate Word Finder analyses any block of writing and surfaces the words you've repeated. It tokenises the input by Unicode letter/number runs, filters out stopwords (the, and, of …) and very-short tokens by default, and then ranks every duplicated word by count. The five top offenders get distinct colour highlights inside your text so you can see at a glance where the repetition lands — perfect for tightening up first drafts. A live frequency table lists every repeated word with its count, filterable by substring. A lexical-diversity ratio (unique ÷ total) gives a one-number quality signal. Toggles let you turn case-folding on/off, include or exclude numbers, change the minimum repetition threshold, and the minimum word length — so you can use it for SEO ('what keywords am I over-using?') or for prose ('what filler am I leaning on?'). Pure functions, nothing uploaded.
How to use it
- Paste your writing into the editor — an essay, blog post, email, anything.
- Tweak the filter: stopwords, case-sensitivity, min length and min count.
- Read the live highlight preview and the ranked duplicate table.
- Edit your text, swap repeats for synonyms, copy a clean TSV report or download it.
Benefits
- Colour-rank highlighting for the top five most-repeated words so the worst offenders pop visually.
- Stopword filter (the, and, of, …) so common glue words don't drown out real repetition.
- Lexical-diversity score (unique ÷ total) as a one-number quality signal for the writing.
- Per-word frequency table with substring filter and copy-as-TSV for spreadsheets.
- Configurable min length and min count thresholds — fits both SEO and prose use cases.
- Live highlight preview updates as you type — no 'analyse' button to click.
- Unicode-aware tokenisation so accented and non-Latin text counts correctly.
- Runs entirely in your browser — your writing never leaves the device.
Frequently asked questions
What counts as a duplicate word?
Any word that appears at least N times in your text, where N is the 'min count' option (default 2). Case-folded comparison by default so 'Apple' and 'apple' count together; turn off 'ignore case' to keep them separate.
Why are 'the', 'and' and 'of' ignored?
They're stopwords — extremely common glue words that you can't really avoid. Filtering them out means the report focuses on the words you're actually choosing to repeat. Turn the toggle off to include them.
What is the lexical-diversity score?
It's the ratio of unique words to total words after the same normalisation, expressed as a percentage. Higher is more varied vocabulary — 70%+ is typical for tight prose; lower numbers suggest the writing leans on the same words too often.
Can I increase the minimum word length?
Yes. Bump 'min length' up to 4 or 5 to filter out short tokens like 'I', 'is' and 'it'. The default 3 already removes 1- and 2-letter words.
Will it work on non-English text?
Yes for tokenisation — the regex uses Unicode letter/number classes, so French, Spanish, German, Hindi and so on tokenise correctly. The stopword list is currently English-only; turn off the toggle for other languages.
Why are only the top 5 words colour-highlighted?
Past five distinct colours, the highlighting starts to look like a rainbow and obscures the text. The frequency table below lists every duplicate so nothing is lost.
Can I export the report?
Yes — copy the duplicates as a TSV (word [tab] count) or download a .tsv file for spreadsheet use.
How does this differ from a word counter?
A word counter tells you the total count and unique-word count. The duplicate finder ranks every individual repeated word so you can act on each one.
Does this suggest synonyms?
Not yet — we keep the tool focused and 100% offline. A thesaurus would either need a network dependency or a heavy bundle. Use the highlights to spot repeats and look them up in your favourite reference.
Is my text uploaded anywhere?
No. Tokenisation, counting and highlighting all happen in your browser — Toollyz has no backend that ever sees your writing.
How big can the input be?
Comfortable up to tens of thousands of words. The work is O(n) for tokenising and O(unique) for highlighting, both fast in the browser thanks to deferred values that keep typing smooth.