Reference: analyze

Commands that inspect text and return insight. Split into deterministic (free, offline) and LLM-backed (text-only NLP, no tools).

The LLM commands use $env.COMMA_ANALYZE_CFG (default: gemini-3.1-flash-lite with no tools) instead of the main COMMA_CFG. See configuration.

Verification commands that need web search (factcheck, quotes, claims) live in their own validate module with a stronger default model and tools enabled.

Deterministic commands

stats

Basic counts via nushell’s str stats plus sentence/paragraph/word stats.

stats [--verbose] [...text]
Flag Default What
-v, --verbose off Also include top words, shortest/longest word lengths

Returns a record with: lines, words, bytes, chars, graphemes, unicode-width, sentences, paragraphs, avg_word_length (plus top_words, shortest_word_length, longest_word_length when verbose).

Alias: ,st

freq

Word frequency, stopword-filtered.

freq [--lang <string>] [--top <int>] [--no-stop] [--stop <list>] [...text]
Flag Default What
-l, --lang en Stopword set: en, da, or none
-n, --top 50 Top-N results (0 = all)
--no-stop off Skip stopword filtering entirely
--stop Custom stopword list (overrides --lang)

Returns a table with value and count columns, sorted descending.

Alias: ,fq

ngrams

Bigram/trigram frequency.

ngrams [--n <int>] [--top <int>] [--lang <string>] [--no-stop] [...text]
Flag Default What
--n 2 Window size (2 = bigrams, 3 = trigrams)
--top 30 Top-N results
-l, --lang en Stopword set
--no-stop off Skip stopword filtering

Stopword filter runs before the window so grams don’t span filtered words.

Alias: ,ng

kwic

Keyword-in-context concordance.

kwic <keyword> [--window <int>] [--case-sensitive] [...text]
Flag Default What
keyword required Word to find
-w, --window 5 Words of context on each side
--case-sensitive off Match case

Returns a table with pos, left, match, right columns.

Alias: ,kc

lix

Scandinavian-standard Lix readability score.

lix [...text]

Returns a record: lix, sentences, words, long_words, avg_sentence_length, long_word_pct, interpretation. Long word = >6 letters.

Interpretation bands: very easy (<25), easy (25–35), middle (35–45), hard (45–55), very hard (≥55).

Alias: ,lx

repeats

Repeated multi-word phrases.

repeats [--min-length <int>] [--min-count <int>] [--top <int>] [...text]
Flag Default What
-l, --min-length 4 Minimum phrase length (words)
-c, --min-count 2 Minimum repetition count
--top 20 Top-N results

Alias: ,rp

compare

Distinctive words in A versus B via smoothed log-odds.

compare <other> [--top <int>] [--lang <string>] [--no-stop] [...text]
Flag Default What
other required Comparison text or filepath
-n, --top 15 Top-N distinctive per side
-l, --lang en Stopword set
--no-stop off Skip stopword filtering

Returns a record with distinctive_in_a and distinctive_in_b tables. Score is base-2 log of the smoothed relative-frequency ratio: positive = more in A.

Alias: ,cp

hapax

Words appearing exactly once.

hapax [--lang <string>] [--no-stop] [...text]

Alias: ,hp

ttr

Type-token ratio (unique_words / total_words).

ttr [--lang <string>] [--no-stop] [...text]

Returns a record: types, tokens, ttr.

Alias: ,tt

similar

Jaccard similarity between two texts via k-shingles.

similar <other> [--k <int>] [...text]
Flag Default What
other required Comparison text or filepath
--k 3 Shingle size (words)

Returns {jaccard, shared, total}.

Alias: ,sl

sentences

Split text into sentences (regex on [.!?]+\s+).

sentences [...text]

Returns a list of strings.

Alias: ,sn

paragraphs

Split text into paragraphs (regex on blank lines).

paragraphs [...text]

Returns a list of strings.

Alias: ,pa

extract

Regex extraction of structured data.

extract [--kind <string>] [...text]
Flag Default What
-k, --kind url One of url, email, hashtag, mention

Returns a deduplicated list of matches.

Alias: ,xt

report

Run all deterministic analyzers (plus the LLM ones unless --no-llm) and return a single structured record.

report [--no-llm] [--lang <string>] [--top <int>] [...text]
Flag Default What
--no-llm off Skip LLM commands (sentiment, keywords, detect, readability)
-l, --lang en Stopword set
-n, --top 15 Top-N for freq, ngrams, keywords

Returns nested records: meta, readability, lexical, top_words, top_bigrams, top_trigrams, repeated_phrases, extracted, llm (unless --no-llm).

Alias: ,rt

LLM-backed commands

These use $env.COMMA_ANALYZE_CFG (default: gemini-3.1-flash-lite, no tools). They are pure text-in, text-out — no network access, no tool use.

detect

Identify the dominant language.

detect [--name] [...text]
Flag Default What
--name off Return English language name instead of ISO 639-1 code

Alias: ,de

sentiment

Classify overall sentiment.

sentiment [--score] [...text]
Flag Default What
--score off Include float score in [-1.0, 1.0]

Returns positive, neutral or negative. With --score: <label> (<score>).

Alias: ,se

keywords

Top-N keywords/key phrases.

keywords [--count <int>] [...text]
Flag Default What
-n, --count 10 Number of keywords

Alias: ,kw

entities

Extract named entities.

entities [--kind <string>] [...text]
Flag Default What
-k, --kind all Filter: person, place, org, date, money

Output format: one per line, <text> — <type>.

Alias: ,en

readability

Qualitative reading level assessment.

readability [...text]

Returns three lines: level: <very easy|easy|medium|hard|very hard>, audience: <noun phrase>, notes: <one-sentence reason>.

Alias: ,rd

classify

Pick the best-fit label from a list.

classify [--multi] ...labels
Flag Default What
--multi off Allow multiple labels (comma-separated output)
...labels ≥2 required The candidate labels

Input text must come via pipe.

Alias: ,cl

Verification commands

factcheck, quotes and claims moved to a separate validate module so analyze can default to a lighter, tool-free model. See the validate reference for their signatures.