Tag

Python

5 articles tagged with "Python"

Minecraft Webcam

2025.01.07

Minecraft Webcam

Real-time face tracking that transforms your webcam feed into a Minecraft character with facial animations. Eyes blink when you blink, mouth opens when you talk.

Features

Real-time Face Tracking — 30 FPS with 468 facial landmarks via MediaPipe
Head Movement — Pitch (up/down), yaw (left/right), roll (tilt)
Facial Animations — Eyes blink automatically, mouth opens when talking
Custom Skins — Supports any standard Minecraft skin (64x64 format)
Animated Skin Support — Create custom facial expressions stored in unused texture space
Virtual Camera Output — Use the avatar in video calls or streaming (via OBS)
System Tray Integration — Runs minimized while active

Technical Highlights

MediaPipe Face Mesh — 468 3D facial landmarks
Minecraft Skin Parser — Exact specification compliance (64x32 and 64x64)
Perspective-correct rendering — Smooth 60 FPS UI updates
Unused texture space — Rows 32-40 store custom animations
Depth sorting — Painter’s algorithm for proper rendering order

Tech Stack

Component	Technology
Face Tracking	MediaPipe Face Mesh
Image Processing	OpenCV
GUI	Tkinter
Virtual Camera	pyvirtualcam
Platform	Windows

How It Works

MediaPipe detects 468 facial landmarks at 30 FPS
Head rotation calculated from nose and eye positions
Minecraft skin texture mapped to 3D quad with perspective correction
Avatar output sent to virtual camera for use in other apps

computer-vision opencv mediapipe

Browser Automation

2025.01.06

Browser Automation

An interactive elements analyzer using Playwright that crawls websites, identifies clickable elements, and generates visual reports with color-coded bounding boxes.

Features

Full Page Screenshot Capture — Complete page screenshots for visual analysis
Interactive Element Detection — Identifies:
- Input fields (text, search, email, password)
- Buttons (native and custom)
- Links (anchor tags)
- Custom elements with event listeners
Event Listener Detection — Heuristically identifies React, Vue, and inline handlers
Visual Reports — Color-coded bounding boxes:
- Green: Input fields
- Red: Buttons
- Blue: Links
- Orange: Generic interactive elements
Structured Output — JSON and text reports with detailed element metadata

Technical Highlights

Playwright for browser automation (Chromium)
Pillow + NumPy for image processing and visualization
JavaScript injection for DOM analysis
Event listener heuristics for framework detection
Python for all scripting

Tech Stack

Component	Technology
Automation	Playwright
Language	Python 3.11
Image Processing	Pillow, NumPy
Browser	Headless Chromium

Usage

# Default (YouTube)
python main.py

# Custom URL
python main.py https://example.com

automation playwright python

Knowledge Arbitrage

2025.01.04

Knowledge Arbitrage

A graph-based filesystem that combines Neo4j’s relationship modeling with LanceDB’s vector search capabilities. Files and directories exist as interconnected nodes, while content is chunked and embedded for semantic querying.

Features

Graph-Based Storage — Files and directories stored as Neo4j nodes with CONTAINS relationships
Hierarchical Structure — Traditional filesystem operations (create, read, write, move, delete)
Semantic Search — LanceDB stores embeddings for content-based querying
Hybrid Queries — Combine graph traversal with vector similarity
Text Optimization — Cleans HTML, normalizes formatting, semantic chunking

Technical Highlights

Neo4j for graph structure and relationship queries
LanceDB for vector storage and similarity search
Cohere embed-english-v3.0 for embeddings
chonkie for semantic text chunking
FastAPI web framework

Tech Stack

Component	Technology
Graph Database	Neo4j 5.0+
Vector Database	LanceDB
Embeddings	Cohere
Text Chunking	chonkie
Web Framework	FastAPI

Architecture

Filesystem (Neo4j)          Embeddings (LanceDB)
├── /projects              → chunk_1 → [embedding]
│   └── /notes             → chunk_2 → [embedding]
└── /docs                  → chunk_3 → [embedding]

Query: "show me files about AI projects"
→ Vector search in LanceDB
→ Returns matching chunks with file paths from Neo4j

neo4j graph-database lancedb

Pet Face Recognition

2025.01.03

Pet Face Recognition

A local-first pet registration and identification system using face recognition. No cloud AI dependencies — all processing happens locally with deterministic embeddings.

Features

Local-Only Architecture — No external cloud APIs; all processing done locally
Facial Embedding Pipeline — Extracts 256-dimensional feature vectors using classical computer vision:
- Local Binary Patterns (LBP)
- Histogram of Oriented Gradients (HOG)
- Discrete Cosine Transform (DCT)
- Color histograms (HSV)
- Multi-scale grid pooling
pgvector Similarity Search — Efficient vector storage and retrieval in PostgreSQL
Pet Registration — Upload multiple images per pet with name and location
Identification — Find matching pets from uploaded images with similarity scoring

Technical Highlights

Deterministic embeddings — Same image always produces identical features
Multi-criteria matching — Similarity score + margin between matches + minimum images
YOLO11 for classification, segmentation, and pose estimation
Docker Compose for easy local development
FastAPI backend with React frontend

Tech Stack

Layer	Technology
Frontend	React 19 + TypeScript + Vite
Backend	FastAPI + SQLAlchemy 2.0
Database	PostgreSQL + pgvector
Embedding	Local deterministic pipeline
Image Storage	Local disk
Testing	pytest + testcontainers

Why Local-First?

Running AI locally means no API costs, no privacy concerns, and deterministic results. The system extracts facial features using classical computer vision techniques that are reproducible and don’t depend on external services.

computer-vision face-recognition yolo

Dynamic RAG

2025.01.02

Dynamic RAG

A sophisticated Retrieval-Augmented Generation system with multi-agent architecture for intelligent question-answering over custom knowledge bases.

Features

Multi-Agent Architecture — Specialized agents working together:
- QA Agent — Main agent for answering questions using the knowledge base
- Personal Agent — For personal/general questions not requiring RAG
- Validation Agent — Classifies queries as on-context, personal, bad, or unknown
- Prompt Agent — Generates diverse query perspectives for comprehensive retrieval
- Reranking Agent — Reorders retrieved chunks for better relevance
Hybrid Search — Combines FAISS vector similarity with keyword matching
Streaming Responses — Token-by-token streaming of LLM outputs
Multi-language Support — Uses multilingual embeddings via Cohere
Document Processing — PDF parsing with OCR support via Mistral

Technical Highlights

LangChain + LangGraph for orchestrating the RAG pipeline
FAISS vector store for similarity search
Cohere embeddings for multilingual support
SQLite for session persistence and analytics
Configurable agents via JSON configuration files

Tech Stack

Component	Technology
Web Framework	FastAPI + Uvicorn
AI Integration	Together AI, LangChain, LangGraph
Vector Store	FAISS
Embeddings	Cohere
Database	SQLite + SQLAlchemy 2.0

Architecture

ValidationAgent → classifies query type
       ↓
PromptAgent → generates specialized prompts
       ↓
QA Agent → retrieves chunks, reranks, generates answer
       ↓
StreamingResponse

ai rag fastapi