mirror of
https://github.com/jessefreitas/omni-token-economy.git
synced 2026-04-26 04:13:49 +00:00
Biblioteca universal de compactação de tokens para aplicações LLM. Zero lock-in de backend — funciona com qualquer dict/object + regras declarativas. Core API (paridade TS ↔ Python): - compactRecord / compact_record — remove redundância via regras declarativas - compactRecords / compact_records — map em lista - compressContext / compress_context — adaptive: top-N verbatim + summary pro resto - compactSecret / compact_secret — whitelist only, valor NUNCA sai (A.8.12) - estimateTokens, detectRedundancy, compactTimestamp — helpers Testes: 27 TS (vitest) + 27 Py (pytest). Fixtures sanitizadas — todos os valores de teste usam placeholders FAKE_TEST_TOKEN_DO_NOT_USE obviamente fake. Regra cardinal #5 (CLAUDE.md): fixtures jamais contêm credencial real. Compliance ISO 27001 / OmniForge baseline: - A.8.10 (exclusão de info desnecessária) — função primária - A.8.11 (mascaramento) — compact_secret whitelist-only - A.8.12 (prevenção de vazamento) — impossível retornar valor de secret - A.8.25/28/29 (dev seguro, codificação, testes) — SDD + TDD + paridade Stack: - TypeScript: Node 24+, ESM, vitest — zero runtime deps - Python: 3.11+, pytest, hatchling — zero runtime deps - CI: lint + test × (3.11, 3.12, 3.13) + gitleaks + CodeQL + benchmark Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
22 lines
731 B
TypeScript
22 lines
731 B
TypeScript
/**
|
|
* Heuristic token estimation.
|
|
*
|
|
* Rule: ~3 chars per token for mixed PT/EN/code — a well-calibrated
|
|
* average that holds within ±15% for typical developer content.
|
|
*
|
|
* Not a replacement for a real tokenizer. When exact counts matter,
|
|
* use the provider's tokenizer (tiktoken, claude-tokenizer, etc.).
|
|
*/
|
|
export function estimateTokens(text: string | null | undefined): number {
|
|
if (!text) return 0;
|
|
return Math.ceil(text.length / 3);
|
|
}
|
|
|
|
export function byteLength(value: unknown): number {
|
|
const s = typeof value === 'string' ? value : JSON.stringify(value);
|
|
return Buffer.byteLength(s, 'utf8');
|
|
}
|
|
|
|
export function estimateObjectTokens(obj: unknown): number {
|
|
return estimateTokens(JSON.stringify(obj));
|
|
}
|