Files
storycove/docs/DATA_MODEL.md
2025-07-24 08:51:45 +02:00

263 lines
10 KiB
Markdown

# StoryCove Data Model Documentation
## Overview
StoryCove uses PostgreSQL as its primary database with UUID-based primary keys throughout. The data model is designed to support a personal library of short stories with rich metadata, author information, and flexible organization through tags and series.
## Entity Relationship Diagram
```
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Authors │────│ Stories │────│ Series │
│ │ │ │ │ │
│ - id (PK) │ │ - id (PK) │ │ - id (PK) │
│ - name │ │ - title │ │ - name │
│ - notes │ │ - content* │ │ - desc │
│ - rating │ │ - rating │ │ │
│ - avatar │ │ - volume │ │ │
└─────────────┘ │ - cover │ └─────────────┘
│ │ - word_count │
│ │ - source_url │
│ │ - timestamps │
│ └──────────────┘
│ │
│ │
┌─────────────┐ │ ┌─────────────┐
│ Author_URLs │ │ │ Tags │
│ │ │ │ │
│ - author_id │ │ │ - id (PK) │
│ - url │ │ │ - name │
└─────────────┘ │ └─────────────┘
│ │
│ │
┌─────────────┐ │
│ Story_Tags │─────────┘
│ │
│ - story_id │
│ - tag_id │
└─────────────┘
```
## Detailed Entity Specifications
### Stories Table
**Table Name**: `stories`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| title | VARCHAR(255) | NOT NULL | Story title |
| summary | TEXT | NULL | Optional story summary |
| description | VARCHAR(1000) | NULL | Optional description |
| content_html | TEXT | NULL | HTML content of the story |
| content_plain | TEXT | NULL | Plain text version (auto-generated) |
| source_url | VARCHAR(255) | NULL | Source URL where story was found |
| cover_path | VARCHAR(255) | NULL | Path to cover image file |
| word_count | INTEGER | NOT NULL, DEFAULT 0 | Word count (auto-calculated) |
| rating | INTEGER | NULL, CHECK (rating >= 1 AND rating <= 5) | Story rating |
| volume | INTEGER | NULL | Volume number if part of series |
| author_id | UUID | FOREIGN KEY | Reference to authors table |
| series_id | UUID | FOREIGN KEY, NULL | Reference to series table |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Last update timestamp |
**Indexes:**
- Primary key on `id`
- Foreign key index on `author_id`
- Foreign key index on `series_id`
- Index on `created_at` for recent stories queries
- Index on `rating` for top-rated queries
- Unique constraint on `source_url` where not null
**Business Rules:**
- Word count is automatically calculated from `content_plain` or `content_html`
- Plain text content is automatically extracted from HTML content using Jsoup
- Volume is only meaningful when series_id is set
- Rating must be between 1-5 if provided
### Authors Table
**Table Name**: `authors`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(255) | NOT NULL, UNIQUE | Author name |
| notes | TEXT | NULL | Notes about the author |
| author_rating | INTEGER | NULL, CHECK (author_rating >= 1 AND author_rating <= 5) | Author rating |
| avatar_image_path | VARCHAR(255) | NULL | Path to avatar image |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
| updated_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Last update timestamp |
**Indexes:**
- Primary key on `id`
- Unique index on `name`
- Index on `author_rating` for top-rated queries
**Business Rules:**
- Author names must be unique across the system
- Rating must be between 1-5 if provided
- Author statistics (story count, average rating) are calculated dynamically
### Author URLs Table
**Table Name**: `author_urls`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| author_id | UUID | FOREIGN KEY, NOT NULL | Reference to authors table |
| url | VARCHAR(255) | NOT NULL | URL associated with author |
**Indexes:**
- Foreign key index on `author_id`
- Composite index on `(author_id, url)` for uniqueness
**Business Rules:**
- One author can have multiple URLs
- URLs are stored as simple strings without validation
- Duplicate URLs for the same author are prevented by application logic
### Series Table
**Table Name**: `series`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(255) | NOT NULL, UNIQUE | Series name |
| description | VARCHAR(1000) | NULL | Series description |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
**Indexes:**
- Primary key on `id`
- Unique index on `name`
**Business Rules:**
- Series names must be unique
- Stories in a series are ordered by volume number
- Series without stories are allowed (placeholder series)
### Tags Table
**Table Name**: `tags`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| id | UUID | PRIMARY KEY, NOT NULL | Unique identifier |
| name | VARCHAR(100) | NOT NULL, UNIQUE | Tag name |
| created_at | TIMESTAMP | NOT NULL, DEFAULT CURRENT_TIMESTAMP | Creation timestamp |
**Indexes:**
- Primary key on `id`
- Unique index on `name`
- Index on `name` for autocomplete queries
**Business Rules:**
- Tag names must be unique and are stored in lowercase
- Tags are created automatically when referenced by stories
- Tag usage statistics are calculated dynamically
### Story Tags Junction Table
**Table Name**: `story_tags`
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| story_id | UUID | FOREIGN KEY, NOT NULL | Reference to stories table |
| tag_id | UUID | FOREIGN KEY, NOT NULL | Reference to tags table |
**Constraints:**
- Primary key on `(story_id, tag_id)`
- Foreign key to `stories(id)` with CASCADE DELETE
- Foreign key to `tags(id)` with CASCADE DELETE
**Indexes:**
- Composite primary key index
- Index on `tag_id` for reverse lookups
## Data Types and Conventions
### UUID Strategy
- All primary keys use UUID (Universally Unique Identifier)
- Generated using `GenerationType.UUID` in Hibernate
- Provides natural uniqueness across distributed systems
- 36-character string representation (e.g., `123e4567-e89b-12d3-a456-426614174000`)
### Timestamp Management
- All entities have `created_at` timestamp
- Stories and Authors have `updated_at` timestamp (automatically updated)
- Series and Tags only have `created_at` (they're rarely modified)
- All timestamps use `LocalDateTime` in Java, stored as `TIMESTAMP` in PostgreSQL
### Text Fields
- **VARCHAR(n)**: For constrained text fields (names, paths, URLs)
- **TEXT**: For unlimited text content (story content, notes, descriptions)
- **HTML Content**: Stored as-is but sanitized on input and output
- **Plain Text**: Automatically extracted from HTML using Jsoup
### Validation Rules
- **Required Fields**: Entity names/titles are always required
- **Length Limits**: Names limited to 255 characters, descriptions to 1000
- **Rating Range**: All ratings constrained to 1-5 range
- **URL Format**: No format validation at database level
- **Uniqueness**: Names are unique within their entity type
## Relationships and Cascading
### One-to-Many Relationships
- **Author → Stories**: Lazy loaded, cascade ALL operations
- **Series → Stories**: Lazy loaded, ordered by volume, cascade ALL
- **Author → Author URLs**: Eager loaded via `@ElementCollection`
### Many-to-Many Relationships
- **Stories ↔ Tags**: Via `story_tags` junction table
- Managed bidirectionally with helper methods
- Cascade DELETE on both sides
### Foreign Key Constraints
- All foreign keys have proper referential integrity
- DELETE operations cascade appropriately
- No orphaned records are allowed
## Performance Considerations
### Indexing Strategy
- Primary keys automatically indexed
- Foreign keys have dedicated indexes
- Frequently queried fields (rating, created_at) are indexed
- Unique constraints automatically create indexes
### Query Optimization
- Lazy loading prevents N+1 queries
- Pagination used for large result sets
- Specialized queries for common access patterns
- Typesense search engine for full-text search (separate from PostgreSQL)
### Data Volume Estimates
- **Stories**: Expected 1K-10K records per user
- **Authors**: Expected 100-1K records per user
- **Tags**: Expected 50-500 records per user
- **Series**: Expected 10-100 records per user
- **Join Tables**: Scale with story count and tagging usage
## Backup and Migration Considerations
### Schema Evolution
- Uses Hibernate `ddl-auto: update` for development
- Production should use controlled migration tools (Flyway/Liquibase)
- UUID keys allow safe data migration between environments
### Data Integrity
- Foreign key constraints ensure referential integrity
- Check constraints validate data ranges
- Application-level validation provides user-friendly error messages
- Unique constraints prevent duplicate data
### Backup Strategy
- Full PostgreSQL dumps for complete backup
- Image files stored separately in filesystem
- Consider incremental backups for large installations
- Test restore procedures regularly
This data model provides a solid foundation for personal story library management with room for future enhancements while maintaining data integrity and performance.