Multimedia Library - OCR & PDF Processing

Version: 1.0.0-experimental | Status: Experimental | Platform: Cross-platform

⚠️

Experimental Feature: The multimedia library is currently in experimental status. Features and APIs may change in future releases. Some functionality requires additional setup (Tesseract for OCR).

The Multimedia Library provides powerful tools for working with images, PDFs, videos, and audio files in your knowledge base. Extract text from images using OCR, process PDF documents, generate thumbnails automatically, and search through your media content.

Overview

The multimedia library enables you to:

Extract text from images using OCR (Optical Character Recognition)
Process PDF documents with text extraction, metadata, and structure analysis
Organize media files with automatic thumbnail generation
Search media content by extracted text and metadata
Manage different file types including images, PDFs, videos, and audio
Analyze metadata including dimensions, file size, and creation dates

Supported Media Types

Images

Supported formats: PNG, JPG, JPEG, GIF, WebP, SVG, BMP

OCR text extraction from any image containing text
Automatic thumbnail generation (256x256px by default)
Dimension extraction (width and height)
EXIF metadata support (coming soon)
Hash-based deduplication to identify duplicate images

PDFs

Full PDF document processing capabilities:

Text extraction from all pages
Metadata extraction (title, author, subject, creator, dates)
Structure parsing (headings, paragraphs, lists, tables)
Page-by-page content analysis
Embedded link extraction
Citation detection (references, footnotes, endnotes)
Image extraction from PDF documents (coming soon)

Videos

Supported formats: MP4, AVI, MKV, MOV, WebM

File metadata (size, creation date, modification date)
Classification and organization
Thumbnail generation (coming soon)

Audio

Supported formats: MP3, WAV, OGG, M4A, FLAC

File metadata (size, creation date, modification date)
Audio transcription (coming soon)

OCR Features

Text Extraction from Images

The OCR engine uses Tesseract to extract text from images with high accuracy.

Basic Usage

Right-click on any image in your workspace
Select “Extract Text (OCR)” from the context menu
The extracted text will be displayed and can be:
- Copied to clipboard
- Inserted into a note
- Saved for later reference

OCR Configuration

Customize OCR settings for better results:

Language Selection

Default: English (eng)
Multi-language support (requires language packs)
Check available languages: Settings → Multimedia → OCR Languages

Page Segmentation Mode (PSM)

3 - Fully automatic page segmentation (default)
6 - Assume a single uniform block of text
11 - Sparse text - find as much text as possible

OCR Engine Mode (OEM)

3 - Default (LSTM neural net mode)
1 - Legacy engine
2 - LSTM + Legacy

Confidence Threshold

Minimum confidence level: 0.6 (60%)
Adjust to filter low-quality extractions

💡

Tip: For best OCR results, use high-resolution images with clear, well-lit text. Avoid images with heavy compression artifacts or handwritten text (unless using specialized language packs).

Batch Processing

Process multiple images at once:

Select multiple images in the file browser
Right-click and choose “Batch Extract Text (OCR)”
View extraction progress and results
Export all extracted text to a single note or separate files

PDF Processing

Content Extraction

Extract comprehensive information from PDF documents:

Text Content

Full document text extraction
Page-by-page content separation
Preserves text layout and formatting

Document Structure

Headings and section hierarchy
Paragraphs with indentation detection
Bullet and numbered lists
Tables with row/column structure
Citations and references

Metadata

Title, author, subject, keywords
Creator and producer applications
Creation and modification dates
Page count and file size

Using PDF Features

Open PDF in Lokus: Drag and drop or use File → Open
Extract Text: Right-click → Extract PDF Text
View Metadata: Right-click → View PDF Metadata
Process Full Content: Right-click → Process PDF Document

The extracted content can be:

Saved as a new note
Referenced in existing notes
Searched using the global search
Linked via wiki links

📚

Use Case: Import research papers and extract text for note-taking. Link extracted content to your notes using wiki links to maintain connections between sources and your insights.

Media Gallery & Organization

Automatic Thumbnail Generation

Thumbnails are automatically generated for images:

Size: 256x256 pixels (configurable)
Format: JPEG for optimal size
Storage: .lokus/thumbnails/ directory
Caching: Reuses existing thumbnails based on file hash
Performance: Generated on-demand, not during workspace scan

Workspace Media Scan

Scan your entire workspace for media files:

Open Command Palette (Cmd/Ctrl + P)
Type “Scan Media Files”
View all discovered images, PDFs, videos, and audio files
Results sorted by modification date (newest first)

The scan process:

Recursively searches all directories
Skips hidden directories and .lokus folder
Identifies files by extension and MIME type
Extracts basic metadata
Creates searchable index

File Hash & Deduplication

Each media file gets a SHA256 hash for:

Duplicate detection - identify identical files
Cache management - reuse thumbnails and extracted content
Performance optimization - skip re-processing unchanged files

Hash calculation is cached based on file modification time for efficiency.

Search & Discovery

Search in Media Content

Search through extracted text from images and PDFs:

Use the global search (Cmd/Ctrl + F)
Enable “Search in Media Content” filter
Results show matching text with file reference
Click to open the source file

Filter by Media Type

Filter search results by specific media types:

Images only
PDFs only
Videos only
Audio only
All media files

Metadata Search

Search by file metadata:

File name
File size range
Creation date range
Modification date
Dimensions (for images)

Setup & Installation

Prerequisites

For full multimedia library functionality, install:

Tesseract OCR (Required for OCR features)

macOS (using Homebrew):

brew install tesseract

macOS (using MacPorts):

sudo port install tesseract

Ubuntu/Debian:

sudo apt-get update
sudo apt-get install tesseract-ocr

Windows:

Download installer from Tesseract GitHub
Run installer and follow prompts
Add Tesseract to PATH during installation

Verify Installation:

tesseract --version

Additional Language Packs (Optional)

Install additional languages for OCR:

macOS:

brew install tesseract-lang

Ubuntu/Debian:

sudo apt-get install tesseract-ocr-[lang]
# Examples:
# tesseract-ocr-fra (French)
# tesseract-ocr-deu (German)
# tesseract-ocr-spa (Spanish)

Windows: Language packs are included in the Tesseract installer.

Checking Feature Availability

Verify OCR is available:

Open Settings → Multimedia
Check “OCR Status” indicator
If unavailable, follow installation instructions above
Restart Lokus after installing Tesseract

View available OCR languages:

Settings → Multimedia → OCR Languages
Lists all installed language packs

Platform-Specific Notes

macOS

Tesseract Locations:

Homebrew (Intel): /usr/local/bin/tesseract
Homebrew (Apple Silicon): /opt/homebrew/bin/tesseract
MacPorts: /opt/local/bin/tesseract

Permissions:

Grant Lokus “Full Disk Access” in System Preferences → Security & Privacy for workspace scanning

Windows

Tesseract Locations:

Default: C:\Program Files\Tesseract-OCR\tesseract.exe
Alternative: C:\Program Files (x86)\Tesseract-OCR\tesseract.exe

PATH Configuration: Ensure Tesseract directory is in your system PATH variable.

Linux

Tesseract Locations:

Default: /usr/bin/tesseract
Alternative: /usr/local/bin/tesseract

Dependencies: Most distributions include required image processing libraries. If you encounter errors, install:

sudo apt-get install libleptonica-dev

Practical Examples

Example 1: Research Paper Processing

Import and process academic papers:

Drag PDF into workspace
Right-click → Process PDF Document
Create a new note for your research
Link to extracted content: [[paper_content]]
Add your notes and insights
Search across all paper content later

Example 2: Screenshot Organization

Organize and extract text from screenshots:

Save screenshots to workspace
Automatic thumbnail generation
Use OCR to extract visible text
Create notes referencing screenshot content
Search by extracted text later

Example 3: Receipt and Invoice Management

Process receipts and invoices:

Scan or photograph receipts
Import images to workspace
Use OCR to extract text (dates, amounts, vendors)
Create expense tracking notes
Link receipts to expense entries
Search by vendor name or date

Example 4: Book and Article Annotation

Extract text from book pages or articles:

Photograph or scan pages
Extract text with OCR
Create notes with extracted quotes
Add your annotations and thoughts
Link quotes to topics in your knowledge base
Build a searchable quote library

Performance Optimization

Thumbnail Caching

Thumbnails are cached to improve performance:

Generated once per image
Stored in .lokus/thumbnails/
Reused based on file hash
Automatically regenerated if source changes

Hash Caching

File hashes are cached in memory:

Recalculated only when file modification time changes
Significantly faster than rehashing on every operation
Cache persists during Lokus session

Batch Processing

For multiple files:

Use batch operations instead of processing individually
Processes run in parallel when possible
Progress indicators show status

⚡

Performance Tip: Large PDF processing can take time. Use batch operations overnight or during breaks. OCR typically processes 1-2 seconds per image depending on complexity.

Limitations & Known Issues

Current Limitations

OCR Limitations:

Accuracy varies with image quality
Handwriting recognition is limited (requires specialized models)
Complex layouts may not preserve structure perfectly
Default English language only (additional languages require setup)

PDF Processing Limitations:

Image extraction from PDFs is not yet implemented
Complex table structures may not parse correctly
Scanned PDFs require OCR (not automatic)
Some PDF security features may block processing

Performance Considerations:

Large batch operations can be CPU-intensive
Video and audio transcription not yet available
Very large PDF files (100+ MB) may process slowly

Known Issues

Issue: OCR may fail on very low-resolution images (< 300 DPI)
- Workaround: Use higher resolution images or upscale before processing
Issue: Some PDF metadata may not extract correctly
- Workaround: Check PDF properties in external viewer for comparison
Issue: Thumbnail generation may timeout on very large images (> 50 MB)
- Workaround: Resize images before importing or increase timeout in settings

Experimental Status

This feature is marked as experimental because:

APIs may change in future versions
Additional dependencies (Tesseract) required
Performance optimization ongoing
Feature set still expanding

⚠️

Backup Recommendation: While the multimedia library is safe to use, we recommend backing up your workspace regularly, especially when processing large batches of files.

Troubleshooting

OCR Not Working

Problem: OCR extraction fails or feature unavailable

Solutions:

Verify Tesseract installation: tesseract --version
Check Tesseract is in system PATH
Restart Lokus after installing Tesseract
Check Settings → Multimedia → OCR Status
Try a simple test image first

PDF Extraction Issues

Problem: PDF text extraction returns empty or garbled text

Solutions:

Verify PDF is not password-protected
Check if PDF is scanned image (requires OCR, not automatic extraction)
Try opening PDF in external viewer to verify content
Some PDFs may have text as images - use OCR instead

Slow Performance

Problem: Media processing is very slow

Solutions:

Reduce batch size (process fewer files at once)
Close other applications to free resources
Check available disk space in .lokus/thumbnails/
Clear thumbnail cache if it grows too large
Use lower resolution images when possible

Thumbnails Not Appearing

Problem: Image thumbnails don’t generate

Solutions:

Check .lokus/thumbnails/ directory exists and is writable
Verify image format is supported
Try regenerating thumbnail manually
Check image file is not corrupted
Ensure sufficient disk space available

Future Enhancements

Planned features for future releases:

Video thumbnail generation from first frame or specific timestamp
Audio transcription using speech-to-text
EXIF data extraction for detailed image metadata
Advanced PDF features: forms, annotations, embedded files
AI-powered image analysis: object detection, scene recognition
Automatic OCR for scanned PDFs
Media organization views: timeline, map, tags
Advanced search: similarity search, reverse image search

Rich Text Editor - Embed media in your notes
Search & Discovery - Search through media content
File Management - Organize media files
Workspace - Configure multimedia settings

Questions or Issues?

If you encounter problems with the multimedia library:

Check Troubleshooting section above
Verify your Setup & Installation
Report issues on GitHub Issues
Join our community for support

Last Updated: January 23, 2025 Experimental Feature - Feedback Welcome!

Plugin System Image Viewer

Multimedia Library - OCR & PDF Processing

Overview

Supported Media Types

Images

PDFs

Videos

Audio

OCR Features

Text Extraction from Images

Basic Usage

OCR Configuration

Batch Processing

PDF Processing

Content Extraction

Using PDF Features

Media Gallery & Organization

Automatic Thumbnail Generation

Workspace Media Scan

File Hash & Deduplication

Search & Discovery

Search in Media Content

Filter by Media Type

Metadata Search

Setup & Installation

Prerequisites

Tesseract OCR (Required for OCR features)

Additional Language Packs (Optional)

Checking Feature Availability

Platform-Specific Notes

macOS

Windows

Linux

Practical Examples

Example 1: Research Paper Processing

Example 2: Screenshot Organization

Example 3: Receipt and Invoice Management

Example 4: Book and Article Annotation

Performance Optimization

Thumbnail Caching

Hash Caching

Batch Processing

Limitations & Known Issues

Current Limitations

Known Issues

Experimental Status

Troubleshooting

OCR Not Working

PDF Extraction Issues

Slow Performance

Thumbnails Not Appearing

Future Enhancements

Related Features