# StoryCove Data Model Documentation

## Overview

StoryCove uses PostgreSQL as its primary database with UUID-based primary keys throughout. The data model is designed to support a personal library of short stories with rich metadata, author information, and flexible organization through tags and series.

## Entity Relationship Diagram

```
┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│   Authors   │────│   Stories    │────│   Series    │
│             │    │              │    │             │
│ - id (PK)   │    │ - id (PK)    │    │ - id (PK)   │
│ - name      │    │ - title      │    │ - name      │
│ - notes     │    │ - content*   │    │ - desc      │
│ - rating    │    │ - rating     │    │             │
│ - avatar    │    │ - volume     │    │             │
└─────────────┘    │ - cover      │    └─────────────┘
       │           │ - word_count │
       │           │ - source_url │
       │           │ - timestamps │
       │           └──────────────┘
       │                  │
       │                  │
┌─────────────┐           │           ┌─────────────┐
│ Author_URLs │           │           │    Tags     │
│             │           │           │             │
│ - author_id │           │           │ - id (PK)   │
│ - url       │           │           │ - name      │
└─────────────┘           │           └─────────────┘
                          │                  │
                          │                  │
                    ┌─────────────┐         │
                    │ Story_Tags  │─────────┘
                    │             │
                    │ - story_id  │
                    │ - tag_id    │
                    └─────────────┘
```

## Detailed Entity Specifications

### Stories Table

**Table Name**: `stories`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| title | VARCHAR(255) | NOT NULL | Story title |
| summary | TEXT | NULL | Optional story summary |
| description | VARCHAR(1000) | NULL | Optional description |
| content_html | TEXT | NULL | HTML content of the story |
| content_plain | TEXT | NULL | Plain text version (auto-generated) |
| source_url | VARCHAR(255) | NULL | Source URL where story was found |
| cover_path | VARCHAR(255) | NULL | Path to cover image file |
| word_count | INTEGER | NOT NULL, DEFAULT 0 | Word count (auto-calculated) |
| rating | INTEGER | NULL, CHECK (rating >= 1 AND rating <= 5) | Story rating |
| volume | INTEGER | NULL | Volume number if part of series |
| author_id | UUID | FOREIGN KEY | Reference to authors table |
| series_id | UUID | FOREIGN KEY, NULL | Reference to series table |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Last update timestamp |

**Indexes:**
- Primary key on `id`
- Foreign key index on `author_id`
- Foreign key index on `series_id`
- Index on `created_at` for recent stories queries
- Index on `rating` for top-rated queries
- Unique constraint on `source_url` where not null

**Business Rules:**
- Word count is automatically calculated from `content_plain` or `content_html`
- Plain text content is automatically extracted from HTML content using Jsoup
- Volume is only meaningful when series_id is set
- Rating must be between 1-5 if provided

### Authors Table

**Table Name**: `authors`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(255) | NOT NULL, UNIQUE | Author name |
| notes | TEXT | NULL | Notes about the author |
| author_rating | INTEGER | NULL, CHECK (author_rating >= 1 AND author_rating <= 5) | Author rating |
| avatar_image_path | VARCHAR(255) | NULL | Path to avatar image |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Last update timestamp |

**Indexes:**
- Primary key on `id`
- Unique index on `name`
- Index on `author_rating` for top-rated queries

**Business Rules:**
- Author names must be unique across the system
- Rating must be between 1-5 if provided
- Author statistics (story count, average rating) are calculated dynamically

### Author URLs Table

**Table Name**: `author_urls`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| author_id | UUID | FOREIGN KEY, NOT NULL | Reference to authors table |
| url | VARCHAR(255) | NOT NULL | URL associated with author |

**Indexes:**
- Foreign key index on `author_id`
- Composite index on `(author_id, url)` for uniqueness

**Business Rules:**
- One author can have multiple URLs
- URLs are stored as simple strings without validation
- Duplicate URLs for the same author are prevented by application logic

### Series Table

**Table Name**: `series`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(255) | NOT NULL, UNIQUE | Series name |
| description | VARCHAR(1000) | NULL | Series description |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |

**Indexes:**
- Primary key on `id`
- Unique index on `name`

**Business Rules:**
- Series names must be unique
- Stories in a series are ordered by volume number
- Series without stories are allowed (placeholder series)

### Tags Table

**Table Name**: `tags`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(100) | NOT NULL, UNIQUE | Tag name |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |

**Indexes:**
- Primary key on `id`
- Unique index on `name`
- Index on `name` for autocomplete queries

**Business Rules:**
- Tag names must be unique and are stored in lowercase
- Tags are created automatically when referenced by stories
- Tag usage statistics are calculated dynamically

### Story Tags Junction Table

**Table Name**: `story_tags`

| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| story_id | UUID | FOREIGN KEY, NOT NULL | Reference to stories table |
| tag_id | UUID | FOREIGN KEY, NOT NULL | Reference to tags table |

**Constraints:**
- Primary key on `(story_id, tag_id)`
- Foreign key to `stories(id)` with CASCADE DELETE
- Foreign key to `tags(id)` with CASCADE DELETE

**Indexes:**
- Composite primary key index
- Index on `tag_id` for reverse lookups

## Data Types and Conventions

### UUID Strategy
- All primary keys use UUID (Universally Unique Identifier)
- Generated using `GenerationType.UUID` in Hibernate
- Provides natural uniqueness across distributed systems
- 36-character string representation (e.g., `123e4567-e89b-12d3-a456-426614174000`)

### Timestamp Management
- All entities have `created_at` timestamp
- Stories and Authors have `updated_at` timestamp (automatically updated)
- Series and Tags only have `created_at` (they're rarely modified)
- All timestamps use `LocalDateTime` in Java, stored as `TIMESTAMP` in PostgreSQL

### Text Fields
- **VARCHAR(n)**: For constrained text fields (names, paths, URLs)
- **TEXT**: For unlimited text content (story content, notes, descriptions)
- **HTML Content**: Stored as-is but sanitized on input and output
- **Plain Text**: Automatically extracted from HTML using Jsoup

### Validation Rules
- **Required Fields**: Entity names/titles are always required
- **Length Limits**: Names limited to 255 characters, descriptions to 1000
- **Rating Range**: All ratings constrained to 1-5 range
- **URL Format**: No format validation at database level
- **Uniqueness**: Names are unique within their entity type

## Relationships and Cascading

### One-to-Many Relationships
- **Author → Stories**: Lazy loaded, cascade ALL operations
- **Series → Stories**: Lazy loaded, ordered by volume, cascade ALL
- **Author → Author URLs**: Eager loaded via `@ElementCollection`

### Many-to-Many Relationships
- **Stories ↔ Tags**: Via `story_tags` junction table
- Managed bidirectionally with helper methods
- Cascade DELETE on both sides

### Foreign Key Constraints
- All foreign keys have proper referential integrity
- DELETE operations cascade appropriately
- No orphaned records are allowed

## Performance Considerations

### Indexing Strategy
- Primary keys automatically indexed
- Foreign keys have dedicated indexes
- Frequently queried fields (rating, created_at) are indexed
- Unique constraints automatically create indexes

### Query Optimization
- Lazy loading prevents N+1 queries
- Pagination used for large result sets
- Specialized queries for common access patterns
- Typesense search engine for full-text search (separate from PostgreSQL)

### Data Volume Estimates
- **Stories**: Expected 1K-10K records per user
- **Authors**: Expected 100-1K records per user
- **Tags**: Expected 50-500 records per user
- **Series**: Expected 10-100 records per user
- **Join Tables**: Scale with story count and tagging usage

## Backup and Migration Considerations

### Schema Evolution
- Uses Hibernate `ddl-auto: update` for development
- Production should use controlled migration tools (Flyway/Liquibase)
- UUID keys allow safe data migration between environments

### Data Integrity
- Foreign key constraints ensure referential integrity
- Check constraints validate data ranges
- Application-level validation provides user-friendly error messages
- Unique constraints prevent duplicate data

### Backup Strategy
- Full PostgreSQL dumps for complete backup
- Image files stored separately in filesystem
- Consider incremental backups for large installations
- Test restore procedures regularly

This data model provides a solid foundation for personal story library management with room for future enhancements while maintaining data integrity and performance.