shardegger/storycove

Fork 0

Files

Stefan Hardegger 379c8c170f Various improvements & Epub support

2025-08-08 14:09:14 +02:00

15 KiB

Raw Blame History

EPUB Import/Export Specification

🎉 Phase 1 Implementation Complete

Status: Phase 1 fully implemented and operational as of August 2025

Key Achievements:

✅ Complete EPUB import functionality with validation and error handling
✅ Single story EPUB export with XML validation fixes
✅ Reading position preservation using EPUB CFI standards
✅ Full frontend UI integration with navigation and authentication
✅ Moved export button to Story Detail View for better UX
✅ Added EPUB import to main Add Story menu dropdown

Overview

This specification defines the requirements and implementation details for importing and exporting EPUB files in StoryCove. The feature enables users to import stories from EPUB files and export their stories/collections as EPUB files with preserved reading positions.

Scope

In Scope

EPUB Import: Parse DRM-free EPUB files and import as stories
EPUB Export: Export individual stories and collections as EPUB files
Reading Position Preservation: Store and restore reading positions using EPUB standards
Metadata Handling: Extract and preserve story metadata (title, author, cover, etc.)
Content Processing: HTML content sanitization and formatting

Out of Scope (Phase 1)

DRM-protected EPUB files (future consideration)
Real-time reading position sync between devices
Advanced EPUB features (audio, video, interactive content)
EPUB validation beyond basic structure

Technical Architecture

Backend Implementation

Language: Java (Spring Boot)
Primary Library: EPUBLib (nl.siegmann.epublib:epublib-core:3.1)
Processing: Server-side generation and parsing
File Handling: Multipart file upload for import, streaming download for export

Dependencies

<dependency>
    <groupId>com.positiondev.epublib</groupId>
    <artifactId>epublib-core</artifactId>
    <version>3.1</version>
</dependency>

Phase 1 Implementation Notes

EPUBImportService: Implemented with full validation, metadata extraction, and reading position handling
EPUBExportService: Implemented with XML validation fixes for EPUB reader compatibility
ReadingPosition Entity: Created with EPUB CFI support and database indexing
Authentication: All endpoints secured with JWT authentication and proper frontend integration
UI Integration: Export moved to Story Detail View, Import added to main navigation menu
XML Compliance: Fixed XHTML validation issues by properly formatting self-closing tags (<br> → <br />)

EPUB Import Specification

Supported Formats

EPUB 2.0 and EPUB 3.x formats
DRM-Free files only
Maximum file size: 50MB
Supported content: Text-based stories with HTML content

Import Process Flow

File Upload: User uploads EPUB file via web interface
Validation: Check file format, size, and basic EPUB structure
Parsing: Extract metadata, content, and resources using EPUBLib
Content Processing: Sanitize HTML content using existing Jsoup pipeline
Story Creation: Create Story entity with extracted data
Preview: Show extracted story details for user confirmation
Finalization: Save story to database with imported metadata

Metadata Mapping

// EPUB Metadata → StoryCove Story Entity
epub.getMetadata().getFirstTitle() → story.title
epub.getMetadata().getAuthors().get(0) → story.authorName  
epub.getMetadata().getDescriptions().get(0) → story.summary
epub.getCoverImage() → story.coverPath
epub.getMetadata().getSubjects() → story.tags

Content Extraction

Multi-chapter EPUBs: Combine all content files into single HTML
Chapter separation: Insert <hr> or <h2> tags between chapters
HTML sanitization: Apply existing sanitization rules
Image handling: Extract and store cover images, inline images optional

API Endpoints

POST /api/stories/import-epub

@PostMapping("/import-epub")
public ResponseEntity<?> importEPUB(@RequestParam("file") MultipartFile file) {
    // Implementation in EPUBImportService
}

Request: Multipart file upload Response:

{
  "message": "EPUB imported successfully",
  "storyId": "uuid",
  "extractedData": {
    "title": "Story Title",
    "author": "Author Name",
    "summary": "Story description",
    "chapterCount": 12,
    "wordCount": 45000,
    "hasCovers": true
  }
}

EPUB Export Specification

Export Types

Single Story Export: Convert one story to EPUB
Collection Export: Multiple stories as single EPUB with chapters

EPUB Structure Generation

story.epub
├── mimetype
├── META-INF/
│   └── container.xml
└── OEBPS/
    ├── content.opf          # Package metadata
    ├── toc.ncx              # Navigation
    ├── stylesheet.css       # Styling
    ├── cover.html           # Cover page
    ├── chapter001.xhtml     # Story content
    ├── images/
    │   └── cover.jpg        # Cover image
    └── fonts/ (optional)

Reading Position Implementation

EPUB 3 CFI (Canonical Fragment Identifier)

<!-- In content.opf metadata -->
<meta property="epub-cfi" content="/6/4[chap01]!/4[body01]/10[para05]/3:142"/>
<meta property="reading-percentage" content="0.65"/>
<meta property="last-read-timestamp" content="2023-12-07T10:30:00Z"/>

StoryCove Custom Metadata (Fallback)

<meta name="storycove:reading-chapter" content="3"/>
<meta name="storycove:reading-paragraph" content="15"/>
<meta name="storycove:reading-offset" content="142"/>
<meta name="storycove:reading-percentage" content="0.65"/>

CFI Generation Logic

public String generateCFI(ReadingPosition position) {
    return String.format("/6/%d[chap%02d]!/4[body01]/%d[para%02d]/3:%d",
        (position.getChapterIndex() * 2) + 4, 
        position.getChapterIndex(),
        (position.getParagraphIndex() * 2) + 4,
        position.getParagraphIndex(),
        position.getCharacterOffset());
}

API Endpoints

GET /api/stories/{id}/export-epub

@GetMapping("/{id}/export-epub")
public ResponseEntity<StreamingResponseBody> exportStory(@PathVariable UUID id) {
    // Implementation in EPUBExportService
}

Response: EPUB file download with headers:

Content-Type: application/epub+zip
Content-Disposition: attachment; filename="story-title.epub"

GET /api/collections/{id}/export-epub

@GetMapping("/{id}/export-epub")
public ResponseEntity<StreamingResponseBody> exportCollection(@PathVariable UUID id) {
    // Implementation in EPUBExportService
}

Response: Multi-story EPUB with table of contents

Data Models

ReadingPosition Entity

@Entity
@Table(name = "reading_positions")
public class ReadingPosition {
    @Id
    private UUID id;
    
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "story_id")
    private Story story;
    
    @Column(name = "chapter_index")
    private Integer chapterIndex = 0;
    
    @Column(name = "paragraph_index") 
    private Integer paragraphIndex = 0;
    
    @Column(name = "character_offset")
    private Integer characterOffset = 0;
    
    @Column(name = "progress_percentage")
    private Double progressPercentage = 0.0;
    
    @Column(name = "epub_cfi")
    private String canonicalFragmentIdentifier;
    
    @Column(name = "last_read_at")
    private LocalDateTime lastReadAt;
    
    @Column(name = "device_identifier")
    private String deviceIdentifier;
    
    // Constructors, getters, setters
}

EPUB Import Request DTO

public class EPUBImportRequest {
    private String filename;
    private Long fileSize;
    private Boolean preserveChapterStructure = true;
    private Boolean extractCover = true;
    private String targetCollectionId; // Optional: add to specific collection
}

EPUB Export Options DTO

public class EPUBExportOptions {
    private Boolean includeReadingPosition = true;
    private Boolean includeCoverImage = true;
    private Boolean includeMetadata = true;
    private String cssStylesheet; // Optional custom CSS
    private EPUBVersion version = EPUBVersion.EPUB3;
}

Service Layer Architecture

EPUBImportService

@Service
public class EPUBImportService {
    
    // Core import method
    public Story importEPUBFile(MultipartFile file, EPUBImportRequest request);
    
    // Helper methods
    private void validateEPUBFile(MultipartFile file);
    private Book parseEPUBStructure(InputStream inputStream);
    private Story extractStoryData(Book epub);
    private String combineChapterContent(Book epub);
    private void extractAndSaveCover(Book epub, Story story);
    private List<String> extractTags(Book epub);
    private ReadingPosition extractReadingPosition(Book epub);
}

EPUBExportService

@Service 
public class EPUBExportService {
    
    // Core export methods
    public byte[] exportSingleStory(UUID storyId, EPUBExportOptions options);
    public byte[] exportCollection(UUID collectionId, EPUBExportOptions options);
    
    // Helper methods
    private Book createEPUBStructure(Story story, ReadingPosition position);
    private Book createCollectionEPUB(Collection collection, List<ReadingPosition> positions);
    private void addReadingPositionMetadata(Book book, ReadingPosition position);
    private String generateCFI(ReadingPosition position);
    private Resource createChapterResource(Story story);
    private Resource createStylesheetResource();
    private void addCoverImage(Book book, Story story);
}

Frontend Integration

Import UI Flow

Upload Interface: File input with EPUB validation
Progress Indicator: Show parsing progress
Preview Screen: Display extracted metadata for confirmation
Confirmation: Allow editing of title, author, summary before saving
Success: Redirect to created story

Export UI Flow

Export Button: Available on story detail and collection pages
Options Modal: Allow selection of export options
Progress Indicator: Show EPUB generation progress
Download: Automatic file download on completion

Frontend API Calls

// Import EPUB
const importEPUB = async (file: File) => {
  const formData = new FormData();
  formData.append('file', file);
  
  const response = await fetch('/api/stories/import-epub', {
    method: 'POST',
    body: formData,
  });
  
  return await response.json();
};

// Export Story
const exportStoryEPUB = async (storyId: string) => {
  const response = await fetch(`/api/stories/${storyId}/export-epub`, {
    method: 'GET',
  });
  
  const blob = await response.blob();
  const url = window.URL.createObjectURL(blob);
  const a = document.createElement('a');
  a.href = url;
  a.download = `${storyTitle}.epub`;
  a.click();
};

Error Handling

Import Errors

Invalid EPUB format: "Invalid EPUB file format"
File too large: "File size exceeds 50MB limit"
DRM protected: "DRM-protected EPUBs not supported"
Corrupted file: "EPUB file appears to be corrupted"
No content: "EPUB contains no readable content"

Export Errors

Story not found: "Story not found or access denied"
Missing content: "Story has no content to export"
Generation failure: "Failed to generate EPUB file"

Security Considerations

File Upload Security

File type validation: Verify EPUB MIME type and structure
Size limits: Enforce maximum file size limits
Content sanitization: Apply existing HTML sanitization
Virus scanning: Consider integration with antivirus scanning

Content Security

HTML sanitization: Apply existing Jsoup rules to imported content
Image validation: Validate extracted cover images
Metadata escaping: Escape special characters in metadata

Testing Strategy

Unit Tests

EPUB parsing and validation logic
CFI generation and parsing
Metadata extraction accuracy
Content sanitization

Integration Tests

End-to-end import/export workflow
Reading position preservation
Multi-story collection export
Error handling scenarios

Test Data

Sample EPUB files for various scenarios
EPUBs with and without reading positions
Multi-chapter EPUBs
EPUBs with covers and metadata

Performance Considerations

Import Performance

Streaming processing: Process large EPUBs without loading entirely into memory
Async processing: Consider async import for large files
Progress tracking: Provide progress feedback for large imports

Export Performance

Caching: Cache generated EPUBs for repeated exports
Streaming: Stream EPUB generation for large collections
Resource optimization: Optimize image and content sizes

Future Enhancements (Out of Scope)

Phase 2 Considerations

DRM support: Research legal and technical feasibility
Reading position sync: Real-time sync across devices
Advanced EPUB features: Enhanced typography, annotations
Bulk operations: Import/export multiple EPUBs
EPUB validation: Full EPUB compliance checking

Integration Possibilities

Cloud storage: Export directly to Google Drive, Dropbox
E-reader sync: Direct sync with Kindle, Kobo devices
Reading analytics: Track reading patterns and statistics

Implementation Phases

Phase 1: Core Functionality ✅ COMPLETED

Basic EPUB import (DRM-free)
Single story export
Reading position storage and retrieval
Frontend UI integration

Phase 2: Enhanced Features

Collection export
Advanced metadata handling
Performance optimizations
Comprehensive error handling

Phase 3: Advanced Features

DRM exploration (legal research required)
Reading position sync
Advanced EPUB features
Analytics and reporting

Acceptance Criteria

Import Success Criteria ✅ COMPLETED

Successfully parse EPUB 2.0 and 3.x files
Extract title, author, summary, and content accurately
Preserve formatting and basic HTML structure
Handle cover images correctly
Import reading positions when present
Provide clear error messages for invalid files

Export Success Criteria ✅ PHASE 1 COMPLETED

Generate valid EPUB files compatible with major readers
Include accurate metadata and content
Embed reading positions using CFI standard
Support single story export
Support collection export (Phase 2)
Generate proper table of contents for collections (Phase 2)
Include cover images when available

This specification serves as the implementation guide for the EPUB import/export feature. All implementation decisions should reference this document for consistency and completeness.

15 KiB Raw Blame History

EPUB Import/Export Specification

🎉 Phase 1 Implementation Complete

Overview

Scope

In Scope

Out of Scope (Phase 1)

Technical Architecture

Backend Implementation

Dependencies

Phase 1 Implementation Notes

EPUB Import Specification

Supported Formats

Import Process Flow

Metadata Mapping

Content Extraction

API Endpoints

POST /api/stories/import-epub

EPUB Export Specification

Export Types

EPUB Structure Generation

Reading Position Implementation

EPUB 3 CFI (Canonical Fragment Identifier)

StoryCove Custom Metadata (Fallback)

CFI Generation Logic

API Endpoints

GET /api/stories/{id}/export-epub

GET /api/collections/{id}/export-epub

Data Models

ReadingPosition Entity

EPUB Import Request DTO

EPUB Export Options DTO

Service Layer Architecture

EPUBImportService

EPUBExportService

Frontend Integration

Import UI Flow

Export UI Flow

Frontend API Calls

Error Handling

Import Errors

Export Errors

Security Considerations

File Upload Security

Content Security

Testing Strategy

Unit Tests

Integration Tests

Test Data

Performance Considerations

Import Performance

Export Performance

Future Enhancements (Out of Scope)

Phase 2 Considerations

Integration Possibilities

Implementation Phases

Phase 1: Core Functionality ✅ COMPLETED

Phase 2: Enhanced Features

Phase 3: Advanced Features

Acceptance Criteria

Import Success Criteria ✅ COMPLETED

Export Success Criteria ✅ PHASE 1 COMPLETED

15 KiB

Raw Blame History