Reference: analyze

Commands that inspect text and return insight. Split into deterministic (free, offline) and LLM-backed (text-only NLP, no tools).

The LLM commands use $env.COMMA_ANALYZE_CFG (default: gemini-3.1-flash-lite with no tools) instead of the main COMMA_CFG. See configuration.

Verification commands that need web search (factcheck, quotes, claims) live in their own validate module with a stronger default model and tools enabled.

Deterministic commands

`stats`

Basic counts via nushell’s str stats plus sentence/paragraph/word stats.

stats [--verbose] [...text]

Flag	Default	What
`-v, --verbose`	off	Also include top words, shortest/longest word lengths

Returns a record with: lines, words, bytes, chars, graphemes, unicode-width, sentences, paragraphs, avg_word_length (plus top_words, shortest_word_length, longest_word_length when verbose).

Alias: ,st

`freq`

Word frequency, stopword-filtered.

freq [--lang <string>] [--top <int>] [--no-stop] [--stop <list>] [...text]

Flag	Default	What
`-l, --lang`	`en`	Stopword set: `en`, `da`, or `none`
`-n, --top`	50	Top-N results (0 = all)
`--no-stop`	off	Skip stopword filtering entirely
`--stop`	—	Custom stopword list (overrides `--lang`)

Returns a table with value and count columns, sorted descending.

Alias: ,fq

`ngrams`

Bigram/trigram frequency.

ngrams [--n <int>] [--top <int>] [--lang <string>] [--no-stop] [...text]

Flag	Default	What
`--n`	2	Window size (2 = bigrams, 3 = trigrams)
`--top`	30	Top-N results
`-l, --lang`	`en`	Stopword set
`--no-stop`	off	Skip stopword filtering

Stopword filter runs before the window so grams don’t span filtered words.

Alias: ,ng

`kwic`

Keyword-in-context concordance.

kwic <keyword> [--window <int>] [--case-sensitive] [...text]

Flag	Default	What
`keyword`	required	Word to find
`-w, --window`	5	Words of context on each side
`--case-sensitive`	off	Match case

Returns a table with pos, left, match, right columns.

Alias: ,kc

`lix`

Scandinavian-standard Lix readability score.

lix [...text]

Returns a record: lix, sentences, words, long_words, avg_sentence_length, long_word_pct, interpretation. Long word = >6 letters.

Interpretation bands: very easy (<25), easy (25–35), middle (35–45), hard (45–55), very hard (≥55).

Alias: ,lx

`repeats`

Repeated multi-word phrases.

repeats [--min-length <int>] [--min-count <int>] [--top <int>] [...text]

Flag	Default	What
`-l, --min-length`	4	Minimum phrase length (words)
`-c, --min-count`	2	Minimum repetition count
`--top`	20	Top-N results

Alias: ,rp

`compare`

Distinctive words in A versus B via smoothed log-odds.

compare <other> [--top <int>] [--lang <string>] [--no-stop] [...text]

Flag	Default	What
`other`	required	Comparison text or filepath
`-n, --top`	15	Top-N distinctive per side
`-l, --lang`	`en`	Stopword set
`--no-stop`	off	Skip stopword filtering

Returns a record with distinctive_in_a and distinctive_in_b tables. Score is base-2 log of the smoothed relative-frequency ratio: positive = more in A.

Alias: ,cp

`hapax`

Words appearing exactly once.

hapax [--lang <string>] [--no-stop] [...text]

Alias: ,hp

`ttr`

Type-token ratio (unique_words / total_words).

ttr [--lang <string>] [--no-stop] [...text]

Returns a record: types, tokens, ttr.

Alias: ,tt

`similar`

Jaccard similarity between two texts via k-shingles.

similar <other> [--k <int>] [...text]

Flag	Default	What
`other`	required	Comparison text or filepath
`--k`	3	Shingle size (words)

Returns {jaccard, shared, total}.

Alias: ,sl

`sentences`

Split text into sentences (regex on [.!?]+\s+).

sentences [...text]

Returns a list of strings.

Alias: ,sn

`paragraphs`

Split text into paragraphs (regex on blank lines).

paragraphs [...text]

Returns a list of strings.

Alias: ,pa

`extract`

Regex extraction of structured data.

extract [--kind <string>] [...text]

Flag	Default	What
`-k, --kind`	`url`	One of `url`, `email`, `hashtag`, `mention`

Returns a deduplicated list of matches.

Alias: ,xt

`report`

Run all deterministic analyzers (plus the LLM ones unless --no-llm) and return a single structured record.

report [--no-llm] [--lang <string>] [--top <int>] [...text]

Flag	Default	What
`--no-llm`	off	Skip LLM commands (sentiment, keywords, detect, readability)
`-l, --lang`	`en`	Stopword set
`-n, --top`	15	Top-N for freq, ngrams, keywords

Returns nested records: meta, readability, lexical, top_words, top_bigrams, top_trigrams, repeated_phrases, extracted, llm (unless --no-llm).

Alias: ,rt

LLM-backed commands

These use $env.COMMA_ANALYZE_CFG (default: gemini-3.1-flash-lite, no tools). They are pure text-in, text-out — no network access, no tool use.

`detect`

Identify the dominant language.

detect [--name] [...text]

Flag	Default	What
`--name`	off	Return English language name instead of ISO 639-1 code

Alias: ,de

`sentiment`

Classify overall sentiment.

sentiment [--score] [...text]

Flag	Default	What
`--score`	off	Include float score in [-1.0, 1.0]

Returns positive, neutral or negative. With --score: <label> (<score>).

Alias: ,se

`keywords`

Top-N keywords/key phrases.

keywords [--count <int>] [...text]

Flag	Default	What
`-n, --count`	10	Number of keywords

Alias: ,kw

`entities`

Extract named entities.

entities [--kind <string>] [...text]

Flag	Default	What
`-k, --kind`	all	Filter: `person`, `place`, `org`, `date`, `money`

Output format: one per line, <text> — <type>.

Alias: ,en

`readability`

Qualitative reading level assessment.

readability [...text]

Returns three lines: level: <very easy|easy|medium|hard|very hard>, audience: <noun phrase>, notes: <one-sentence reason>.

Alias: ,rd

`classify`

Pick the best-fit label from a list.

classify [--multi] ...labels

Flag	Default	What
`--multi`	off	Allow multiple labels (comma-separated output)
`...labels`	≥2 required	The candidate labels

Input text must come via pipe.

Alias: ,cl

Verification commands

factcheck, quotes and claims moved to a separate validate module so analyze can default to a lighter, tool-free model. See the validate reference for their signatures.