4 OBSERVERS
Project Details

Cortex

A local-first multimodal RAG engine that transforms unstructured file systems into a queryable 'Second Brain' using edge-based AI and vector semantics.

Local-First Intelligence Engine

Cortex is a privacy-sovereign desktop application engineered to convert chaotic local file systems into a structured, semantic knowledge base. It functions as an on-device "Second Brain," capable of ingesting, understanding, and retrieving information from documents and images without a single byte leaving the machine.

By leveraging Edge AI, Cortex eliminates cloud dependence, ensuring absolute data privacy while delivering low-latency multimodal retrieval.

System Architecture

The system orchestrates a hybrid architecture, bridging a modern web-tech frontend with a high-performance Python inference engine.

  • Architecture: Hybrid Electron (Node.js) + Python (FastAPI)
  • Inference Strategy: Local LLM/VLM via Ollama
  • Indexing: Vector Embeddings (ChromaDB)
  • Privacy Level: Air-gapped / Local-only

Core Capabilities

Cortex transcends traditional keyword search by implementing a full Multimodal RAG (Retrieval-Augmented Generation) pipeline:

FeatureDescriptionTechnology
Visual Semantics"Looks" at images to generate searchable captions (e.g., invoices, charts)Llava (Vision-Language Model)
Vector SearchMaps queries to file content based on meaning, not just filenamesHugging Face Embeddings
Cross-LingualSeamlessly maps Thai natural language queries to English contentInternal Translation Layer
Edge RAGSynthesizes answers from retrieved context purely on-deviceMistral / Llama via Ollama

Technology Stack

ComponentTechnologyRole
FrontendElectron + Next.js"Organic Bento Glass" UI for a modern desktop experience.
BackendPython (FastAPI)Handles file ingestion, PDF extraction, and API orchestration.
Vector DBChromaDBHigh-performance local vector store for embeddings.
InferenceOllamaManages local model quantization and execution.

Implementation

Deploying Cortex requires the Ollama runtime for model orchestration.

1. Model Provisioning

Initialize the required local models (Vision and Text):

ollama pull mistral  # The Reasoning Brain
ollama pull llava    # The Visual Cortex

2. Ignition

Start the inference backend and the client interface:

# Backend (The Brain)
cd backend
pip install -r requirements.txt
uvicorn main:app --reload
 
# Frontend (The Interface)
cd frontend
npm install
npm run dev

Operational Logic

The Ingestion Pipeline:

  • Scan: System traverses target directories for .pdf, .txt, .png, and .jpg.
  • Vision Decoding: Images are passed through Llava to generate dense descriptive captions (e.g., "A screenshot of a K-Bank transaction slip").
  • Vectorization: Text and captions are converted into 768-dimensional vectors using paraphrase-multilingual models.
  • Retrieval: User queries (e.g., "หาสลิปเงินที่โอนเมื่อวาน") are translated, vectorized, and matched against the local ChromaDB index to retrieve the exact file context.

Design Philosophy

Privacy Sovereignty: In 2026, data ownership is paramount. Cortex is built on the principle that your personal data—financial records, journals, project files—should never be processed on a third-party server.

Solving the Semantic Gap: Most local search tools fail at images. Cortex bridges this gap by automatically converting visual data into semantic text, making your screenshots just as searchable as your documents.