Question 1

What counts as a duplicate word?

Accepted Answer

Any word that appears at least N times in your text, where N is the 'min count' option (default 2). Case-folded comparison by default so 'Apple' and 'apple' count together; turn off 'ignore case' to keep them separate.

Question 2

Why are 'the', 'and' and 'of' ignored?

Accepted Answer

They're stopwords — extremely common glue words that you can't really avoid. Filtering them out means the report focuses on the words you're actually choosing to repeat. Turn the toggle off to include them.

Question 3

What is the lexical-diversity score?

Accepted Answer

It's the ratio of unique words to total words after the same normalisation, expressed as a percentage. Higher is more varied vocabulary — 70%+ is typical for tight prose; lower numbers suggest the writing leans on the same words too often.

Question 4

Can I increase the minimum word length?

Accepted Answer

Yes. Bump 'min length' up to 4 or 5 to filter out short tokens like 'I', 'is' and 'it'. The default 3 already removes 1- and 2-letter words.

Question 5

Will it work on non-English text?

Accepted Answer

Yes for tokenisation — the regex uses Unicode letter/number classes, so French, Spanish, German, Hindi and so on tokenise correctly. The stopword list is currently English-only; turn off the toggle for other languages.

Question 6

Why are only the top 5 words colour-highlighted?

Accepted Answer

Past five distinct colours, the highlighting starts to look like a rainbow and obscures the text. The frequency table below lists every duplicate so nothing is lost.

Question 7

Can I export the report?

Accepted Answer

Yes — copy the duplicates as a TSV (word [tab] count) or download a .tsv file for spreadsheet use.

Question 8

How does this differ from a word counter?

Accepted Answer

A word counter tells you the total count and unique-word count. The duplicate finder ranks every individual repeated word so you can act on each one.

Question 9

Does this suggest synonyms?

Accepted Answer

Not yet — we keep the tool focused and 100% offline. A thesaurus would either need a network dependency or a heavy bundle. Use the highlights to spot repeats and look them up in your favourite reference.

Question 10

Is my text uploaded anywhere?

Accepted Answer

No. Tokenisation, counting and highlighting all happen in your browser — Toollyz has no backend that ever sees your writing.

Question 11

How big can the input be?

Accepted Answer

Comfortable up to tens of thousands of words. The work is O(n) for tokenising and O(unique) for highlighting, both fast in the browser thanks to deferred values that keep typing smooth.

Duplicate Word Finder

What is the Duplicate Word Finder?

How to use it

Benefits

Frequently asked questions

What counts as a duplicate word?

Why are 'the', 'and' and 'of' ignored?

What is the lexical-diversity score?

Can I increase the minimum word length?

Will it work on non-English text?

Why are only the top 5 words colour-highlighted?

Can I export the report?

How does this differ from a word counter?

Does this suggest synonyms?

Is my text uploaded anywhere?

How big can the input be?

Word Counter

Duplicate Line Remover

Case Converter

Slugify

What is the Duplicate Word Finder?

How to use it

Benefits

Frequently asked questions

What counts as a duplicate word?

Why are 'the', 'and' and 'of' ignored?

What is the lexical-diversity score?

Can I increase the minimum word length?

Will it work on non-English text?

Why are only the top 5 words colour-highlighted?

Can I export the report?

How does this differ from a word counter?

Does this suggest synonyms?

Is my text uploaded anywhere?

How big can the input be?

Related tools

Word Counter

Duplicate Line Remover

Case Converter

Slugify