Files
storycove/SOLR_LIBRARY_MIGRATION.md
2025-09-23 14:42:38 +02:00

7.3 KiB

Solr Library Separation Migration Guide

This guide explains how to migrate existing StoryCove deployments to support proper library separation in Solr search.

What Changed

The Solr service has been enhanced to support multi-tenant library separation by:

  • Adding a libraryId field to all Solr documents
  • Filtering all search queries by the current library context
  • Ensuring complete data isolation between libraries

Migration Options

Best for: Development, staging, and Docker-based deployments where data loss is acceptable.

# Stop the application
docker-compose down

# Remove only the Solr data volume (preserves database and images)
docker volume rm storycove_solr_data

# Restart - Solr will recreate cores with new schema
docker-compose up -d

# Wait for services to start, then trigger reindex via admin panel

Pros: Clean, simple, guaranteed to work Cons: Requires downtime, loses existing search index

Option 2: Schema API Migration (Production Safe)

Best for: Production environments where you need to preserve uptime.

Method A: Automatic (Recommended)

# Single endpoint that adds field and migrates data
curl -X POST "http://your-app-host/api/admin/search/solr/migrate-library-schema" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Method B: Manual Steps

# Step 1: Add libraryId field via app API
curl -X POST "http://your-app-host/api/admin/search/solr/add-library-field" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

# Step 2: Run migration
curl -X POST "http://your-app-host/api/admin/search/solr/migrate-library-schema" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Method C: Direct Solr API (if app API fails)

# Add libraryId field to stories core
curl -X POST "http://your-solr-host:8983/solr/storycove_stories/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field": {
      "name": "libraryId",
      "type": "string",
      "indexed": true,
      "stored": true,
      "required": false
    }
  }'

# Add libraryId field to authors core
curl -X POST "http://your-solr-host:8983/solr/storycove_authors/schema" \
  -H "Content-Type: application/json" \
  -d '{
    "add-field": {
      "name": "libraryId",
      "type": "string",
      "indexed": true,
      "stored": true,
      "required": false
    }
  }'

# Then run the migration
curl -X POST "http://your-app-host/api/admin/search/solr/migrate-library-schema" \
  -H "Authorization: Bearer YOUR_JWT_TOKEN"

Pros: No downtime, preserves service availability, automatic field addition Cons: Requires API access

Best for: Production environments with proper admin access.

  1. Deploy the code changes to your environment
  2. Access the admin panel of your application
  3. Navigate to search settings
  4. Use the "Migrate Library Schema" button or API endpoint:
    POST /api/admin/search/solr/migrate-library-schema
    

Pros: User-friendly, handles all complexity internally Cons: Requires admin access to application

Step-by-Step Migration Process

For Docker Deployments

  1. Backup your data (optional but recommended):

    # Backup database
    docker-compose exec postgres pg_dump -U storycove storycove > backup.sql
    
  2. Pull the latest code with library separation fixes

  3. Choose migration approach:

    • Quick & Clean: Use Option 1 (volume reset)
    • Production: Use Option 2 or 3
  4. Verify migration:

    • Log in with different library passwords
    • Perform searches to confirm isolation
    • Check that new content gets indexed with library IDs

For Kubernetes/Production Deployments

  1. Update your deployment with the new container images

  2. Add the libraryId field to Solr schema using Option 2

  3. Use the migration endpoint (Option 3):

    kubectl exec -it deployment/storycove-backend -- \
      curl -X POST http://localhost:8080/api/admin/search/solr/migrate-library-schema
    
  4. Monitor logs for successful migration

Verification Steps

After migration, verify that library separation is working:

  1. Test with multiple libraries:

    • Log in with Library A password
    • Add/search content
    • Log in with Library B password
    • Confirm Library A content is not visible
  2. Check Solr directly (if accessible):

    # Should show documents with libraryId field
    curl "http://solr:8983/solr/storycove_stories/select?q=*:*&fl=id,title,libraryId&rows=5"
    
  3. Monitor application logs for any library separation errors

Troubleshooting

"unknown field 'libraryId'" Error

Problem: ERROR: [doc=xxx] unknown field 'libraryId'

Cause: The Solr schema doesn't have the libraryId field yet.

Solutions:

  1. Use the automated migration (adds field automatically):

    curl -X POST "http://your-app/api/admin/search/solr/migrate-library-schema"
    
  2. Add field manually first:

    # Add field via app API
    curl -X POST "http://your-app/api/admin/search/solr/add-library-field"
    
    # Then run migration
    curl -X POST "http://your-app/api/admin/search/solr/migrate-library-schema"
    
  3. Direct Solr API (if app API fails):

    # Add to both cores
    curl -X POST "http://solr:8983/solr/storycove_stories/schema" \
      -H "Content-Type: application/json" \
      -d '{"add-field":{"name":"libraryId","type":"string","indexed":true,"stored":true}}'
    
    curl -X POST "http://solr:8983/solr/storycove_authors/schema" \
      -H "Content-Type: application/json" \
      -d '{"add-field":{"name":"libraryId","type":"string","indexed":true,"stored":true}}'
    
  4. For development: Use Option 1 (volume reset) for clean restart

Migration Endpoint Returns Error

Common causes:

  • Solr is not available (check connectivity)
  • No active library context (ensure user is authenticated)
  • Insufficient permissions (check JWT token/authentication)

Search Results Still Mixed

This indicates incomplete migration:

  • Clear all Solr data and reindex completely
  • Verify that all documents have libraryId field
  • Check that search queries include library filters

Environment-Specific Notes

Development

  • Use Option 1 (volume reset) for simplicity
  • Data loss is acceptable in dev environments

Staging

  • Use Option 2 or 3 to test production migration procedures
  • Verify migration process before applying to production

Production

  • Always backup data first
  • Use Option 2 (Schema API) or Option 3 (Admin endpoint)
  • Plan for brief performance impact during reindexing
  • Monitor system resources during bulk reindexing

Performance Considerations

  • Reindexing time: Depends on data size (typically 1000 docs/second)
  • Memory usage: May increase during bulk indexing
  • Search performance: Minimal impact from library filtering
  • Storage: Slight increase due to libraryId field

Rollback Plan

If issues occur:

  1. Immediate: Restart Solr to previous state (if using Option 1)
  2. Schema revert: Remove libraryId field via Schema API
  3. Code rollback: Deploy previous version without library separation
  4. Data restore: Restore from backup if necessary

This migration enables proper multi-tenant isolation while maintaining search performance and functionality.