Local-First Intelligence Engine

Cortex is a privacy-sovereign desktop application engineered to convert chaotic local file systems into a structured, semantic knowledge base. It functions as an on-device "Second Brain," capable of ingesting, understanding, and retrieving information from documents and images without a single byte leaving the machine.

By leveraging Edge AI, Cortex eliminates cloud dependence, ensuring absolute data privacy while delivering low-latency multimodal retrieval.

System Architecture

The system orchestrates a hybrid architecture, bridging a modern web-tech frontend with a high-performance Python inference engine.

Architecture: Hybrid Electron (Node.js) + Python (FastAPI)
Inference Strategy: Local LLM/VLM via Ollama
Indexing: Vector Embeddings (ChromaDB)
Privacy Level: Air-gapped / Local-only

Core Capabilities

Cortex transcends traditional keyword search by implementing a full Multimodal RAG (Retrieval-Augmented Generation) pipeline:

Feature	Description	Technology
Visual Semantics	"Looks" at images to generate searchable captions (e.g., invoices, charts)	Llava (Vision-Language Model)
Vector Search	Maps queries to file content based on meaning, not just filenames	Hugging Face Embeddings
Cross-Lingual	Seamlessly maps Thai natural language queries to English content	Internal Translation Layer
Edge RAG	Synthesizes answers from retrieved context purely on-device	Mistral / Llama via Ollama

Technology Stack

Component	Technology	Role
Frontend	Electron + Next.js	"Organic Bento Glass" UI for a modern desktop experience.
Backend	Python (FastAPI)	Handles file ingestion, PDF extraction, and API orchestration.
Vector DB	ChromaDB	High-performance local vector store for embeddings.
Inference	Ollama	Manages local model quantization and execution.

Implementation

Deploying Cortex requires the Ollama runtime for model orchestration.

1. Model Provisioning

Initialize the required local models (Vision and Text):

ollama pull mistral  # The Reasoning Brain
ollama pull llava    # The Visual Cortex

2. Ignition

Start the inference backend and the client interface:

# Backend (The Brain)
cd backend
pip install -r requirements.txt
uvicorn main:app --reload
 
# Frontend (The Interface)
cd frontend
npm install
npm run dev

Operational Logic

The Ingestion Pipeline:

Scan: System traverses target directories for .pdf, .txt, .png, and .jpg.
Vision Decoding: Images are passed through Llava to generate dense descriptive captions (e.g., "A screenshot of a K-Bank transaction slip").
Vectorization: Text and captions are converted into 768-dimensional vectors using paraphrase-multilingual models.
Retrieval: User queries (e.g., "หาสลิปเงินที่โอนเมื่อวาน") are translated, vectorized, and matched against the local ChromaDB index to retrieve the exact file context.

Design Philosophy

Privacy Sovereignty: In 2026, data ownership is paramount. Cortex is built on the principle that your personal data—financial records, journals, project files—should never be processed on a third-party server.

Solving the Semantic Gap: Most local search tools fail at images. Cortex bridges this gap by automatically converting visual data into semantic text, making your screenshots just as searchable as your documents.