Files
storycove/docs/DATA_MODEL.md
2025-07-24 08:51:45 +02:00

10 KiB

StoryCove Data Model Documentation

Overview

StoryCove uses PostgreSQL as its primary database with UUID-based primary keys throughout. The data model is designed to support a personal library of short stories with rich metadata, author information, and flexible organization through tags and series.

Entity Relationship Diagram

┌─────────────┐    ┌──────────────┐    ┌─────────────┐
│   Authors   │────│   Stories    │────│   Series    │
│             │    │              │    │             │
│ - id (PK)   │    │ - id (PK)    │    │ - id (PK)   │
│ - name      │    │ - title      │    │ - name      │
│ - notes     │    │ - content*   │    │ - desc      │
│ - rating    │    │ - rating     │    │             │
│ - avatar    │    │ - volume     │    │             │
└─────────────┘    │ - cover      │    └─────────────┘
       │           │ - word_count │
       │           │ - source_url │
       │           │ - timestamps │
       │           └──────────────┘
       │                  │
       │                  │
┌─────────────┐           │           ┌─────────────┐
│ Author_URLs │           │           │    Tags     │
│             │           │           │             │
│ - author_id │           │           │ - id (PK)   │
│ - url       │           │           │ - name      │
└─────────────┘           │           └─────────────┘
                          │                  │
                          │                  │
                    ┌─────────────┐         │
                    │ Story_Tags  │─────────┘
                    │             │
                    │ - story_id  │
                    │ - tag_id    │
                    └─────────────┘

Detailed Entity Specifications

Stories Table

Table Name: stories

Column Type Constraints Description
id UUID PRIMARY KEY, NOT NULL Unique identifier
title VARCHAR(255) NOT NULL Story title
summary TEXT NULL Optional story summary
description VARCHAR(1000) NULL Optional description
content_html TEXT NULL HTML content of the story
content_plain TEXT NULL Plain text version (auto-generated)
source_url VARCHAR(255) NULL Source URL where story was found
cover_path VARCHAR(255) NULL Path to cover image file
word_count INTEGER NOT NULL, DEFAULT 0 Word count (auto-calculated)
rating INTEGER NULL, CHECK (rating >= 1 AND rating <= 5) Story rating
volume INTEGER NULL Volume number if part of series
author_id UUID FOREIGN KEY Reference to authors table
series_id UUID FOREIGN KEY, NULL Reference to series table
created_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Creation timestamp
updated_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Last update timestamp

Indexes:

  • Primary key on id
  • Foreign key index on author_id
  • Foreign key index on series_id
  • Index on created_at for recent stories queries
  • Index on rating for top-rated queries
  • Unique constraint on source_url where not null

Business Rules:

  • Word count is automatically calculated from content_plain or content_html
  • Plain text content is automatically extracted from HTML content using Jsoup
  • Volume is only meaningful when series_id is set
  • Rating must be between 1-5 if provided

Authors Table

Table Name: authors

Column Type Constraints Description
id UUID PRIMARY KEY, NOT NULL Unique identifier
name VARCHAR(255) NOT NULL, UNIQUE Author name
notes TEXT NULL Notes about the author
author_rating INTEGER NULL, CHECK (author_rating >= 1 AND author_rating <= 5) Author rating
avatar_image_path VARCHAR(255) NULL Path to avatar image
created_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Creation timestamp
updated_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Last update timestamp

Indexes:

  • Primary key on id
  • Unique index on name
  • Index on author_rating for top-rated queries

Business Rules:

  • Author names must be unique across the system
  • Rating must be between 1-5 if provided
  • Author statistics (story count, average rating) are calculated dynamically

Author URLs Table

Table Name: author_urls

Column Type Constraints Description
author_id UUID FOREIGN KEY, NOT NULL Reference to authors table
url VARCHAR(255) NOT NULL URL associated with author

Indexes:

  • Foreign key index on author_id
  • Composite index on (author_id, url) for uniqueness

Business Rules:

  • One author can have multiple URLs
  • URLs are stored as simple strings without validation
  • Duplicate URLs for the same author are prevented by application logic

Series Table

Table Name: series

Column Type Constraints Description
id UUID PRIMARY KEY, NOT NULL Unique identifier
name VARCHAR(255) NOT NULL, UNIQUE Series name
description VARCHAR(1000) NULL Series description
created_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Creation timestamp

Indexes:

  • Primary key on id
  • Unique index on name

Business Rules:

  • Series names must be unique
  • Stories in a series are ordered by volume number
  • Series without stories are allowed (placeholder series)

Tags Table

Table Name: tags

Column Type Constraints Description
id UUID PRIMARY KEY, NOT NULL Unique identifier
name VARCHAR(100) NOT NULL, UNIQUE Tag name
created_at TIMESTAMP NOT NULL, DEFAULT CURRENT_TIMESTAMP Creation timestamp

Indexes:

  • Primary key on id
  • Unique index on name
  • Index on name for autocomplete queries

Business Rules:

  • Tag names must be unique and are stored in lowercase
  • Tags are created automatically when referenced by stories
  • Tag usage statistics are calculated dynamically

Story Tags Junction Table

Table Name: story_tags

Column Type Constraints Description
story_id UUID FOREIGN KEY, NOT NULL Reference to stories table
tag_id UUID FOREIGN KEY, NOT NULL Reference to tags table

Constraints:

  • Primary key on (story_id, tag_id)
  • Foreign key to stories(id) with CASCADE DELETE
  • Foreign key to tags(id) with CASCADE DELETE

Indexes:

  • Composite primary key index
  • Index on tag_id for reverse lookups

Data Types and Conventions

UUID Strategy

  • All primary keys use UUID (Universally Unique Identifier)
  • Generated using GenerationType.UUID in Hibernate
  • Provides natural uniqueness across distributed systems
  • 36-character string representation (e.g., 123e4567-e89b-12d3-a456-426614174000)

Timestamp Management

  • All entities have created_at timestamp
  • Stories and Authors have updated_at timestamp (automatically updated)
  • Series and Tags only have created_at (they're rarely modified)
  • All timestamps use LocalDateTime in Java, stored as TIMESTAMP in PostgreSQL

Text Fields

  • VARCHAR(n): For constrained text fields (names, paths, URLs)
  • TEXT: For unlimited text content (story content, notes, descriptions)
  • HTML Content: Stored as-is but sanitized on input and output
  • Plain Text: Automatically extracted from HTML using Jsoup

Validation Rules

  • Required Fields: Entity names/titles are always required
  • Length Limits: Names limited to 255 characters, descriptions to 1000
  • Rating Range: All ratings constrained to 1-5 range
  • URL Format: No format validation at database level
  • Uniqueness: Names are unique within their entity type

Relationships and Cascading

One-to-Many Relationships

  • Author → Stories: Lazy loaded, cascade ALL operations
  • Series → Stories: Lazy loaded, ordered by volume, cascade ALL
  • Author → Author URLs: Eager loaded via @ElementCollection

Many-to-Many Relationships

  • Stories ↔ Tags: Via story_tags junction table
  • Managed bidirectionally with helper methods
  • Cascade DELETE on both sides

Foreign Key Constraints

  • All foreign keys have proper referential integrity
  • DELETE operations cascade appropriately
  • No orphaned records are allowed

Performance Considerations

Indexing Strategy

  • Primary keys automatically indexed
  • Foreign keys have dedicated indexes
  • Frequently queried fields (rating, created_at) are indexed
  • Unique constraints automatically create indexes

Query Optimization

  • Lazy loading prevents N+1 queries
  • Pagination used for large result sets
  • Specialized queries for common access patterns
  • Typesense search engine for full-text search (separate from PostgreSQL)

Data Volume Estimates

  • Stories: Expected 1K-10K records per user
  • Authors: Expected 100-1K records per user
  • Tags: Expected 50-500 records per user
  • Series: Expected 10-100 records per user
  • Join Tables: Scale with story count and tagging usage

Backup and Migration Considerations

Schema Evolution

  • Uses Hibernate ddl-auto: update for development
  • Production should use controlled migration tools (Flyway/Liquibase)
  • UUID keys allow safe data migration between environments

Data Integrity

  • Foreign key constraints ensure referential integrity
  • Check constraints validate data ranges
  • Application-level validation provides user-friendly error messages
  • Unique constraints prevent duplicate data

Backup Strategy

  • Full PostgreSQL dumps for complete backup
  • Image files stored separately in filesystem
  • Consider incremental backups for large installations
  • Test restore procedures regularly

This data model provides a solid foundation for personal story library management with room for future enhancements while maintaining data integrity and performance.