Various improvements & Epub support

2025-08-08 14:09:14 +02:00
parent 090b858a54
commit 379c8c170f
37 changed files with 4069 additions and 298 deletions
--- a/EPUB_IMPORT_EXPORT_SPECIFICATION.md
+++ b/EPUB_IMPORT_EXPORT_SPECIFICATION.md
@@ -0,0 +1,459 @@
+# EPUB Import/Export Specification
+
+## 🎉 Phase 1 Implementation Complete
+
+**Status**: Phase 1 fully implemented and operational as of August 2025
+
+**Key Achievements**:
+- ✅ Complete EPUB import functionality with validation and error handling
+- ✅ Single story EPUB export with XML validation fixes  
+- ✅ Reading position preservation using EPUB CFI standards
+- ✅ Full frontend UI integration with navigation and authentication
+- ✅ Moved export button to Story Detail View for better UX
+- ✅ Added EPUB import to main Add Story menu dropdown
+
+## Overview
+
+This specification defines the requirements and implementation details for importing and exporting EPUB files in StoryCove. The feature enables users to import stories from EPUB files and export their stories/collections as EPUB files with preserved reading positions.
+
+## Scope
+
+### In Scope
+- **EPUB Import**: Parse DRM-free EPUB files and import as stories
+- **EPUB Export**: Export individual stories and collections as EPUB files
+- **Reading Position Preservation**: Store and restore reading positions using EPUB standards
+- **Metadata Handling**: Extract and preserve story metadata (title, author, cover, etc.)
+- **Content Processing**: HTML content sanitization and formatting
+
+### Out of Scope (Phase 1)
+- DRM-protected EPUB files (future consideration)
+- Real-time reading position sync between devices
+- Advanced EPUB features (audio, video, interactive content)
+- EPUB validation beyond basic structure
+
+## Technical Architecture
+
+### Backend Implementation
+- **Language**: Java (Spring Boot)
+- **Primary Library**: EPUBLib (nl.siegmann.epublib:epublib-core:3.1)
+- **Processing**: Server-side generation and parsing
+- **File Handling**: Multipart file upload for import, streaming download for export
+
+### Dependencies
+```xml
+<dependency>
+    <groupId>com.positiondev.epublib</groupId>
+    <artifactId>epublib-core</artifactId>
+    <version>3.1</version>
+</dependency>
+```
+
+### Phase 1 Implementation Notes
+- **EPUBImportService**: Implemented with full validation, metadata extraction, and reading position handling
+- **EPUBExportService**: Implemented with XML validation fixes for EPUB reader compatibility  
+- **ReadingPosition Entity**: Created with EPUB CFI support and database indexing
+- **Authentication**: All endpoints secured with JWT authentication and proper frontend integration
+- **UI Integration**: Export moved to Story Detail View, Import added to main navigation menu
+- **XML Compliance**: Fixed XHTML validation issues by properly formatting self-closing tags (`<br>` → `<br />`)
+
+## EPUB Import Specification
+
+### Supported Formats
+- **EPUB 2.0** and **EPUB 3.x** formats
+- **DRM-Free** files only
+- **Maximum file size**: 50MB
+- **Supported content**: Text-based stories with HTML content
+
+### Import Process Flow
+1. **File Upload**: User uploads EPUB file via web interface
+2. **Validation**: Check file format, size, and basic EPUB structure
+3. **Parsing**: Extract metadata, content, and resources using EPUBLib
+4. **Content Processing**: Sanitize HTML content using existing Jsoup pipeline
+5. **Story Creation**: Create Story entity with extracted data
+6. **Preview**: Show extracted story details for user confirmation
+7. **Finalization**: Save story to database with imported metadata
+
+### Metadata Mapping
+```java
+// EPUB Metadata → StoryCove Story Entity
+epub.getMetadata().getFirstTitle() → story.title
+epub.getMetadata().getAuthors().get(0) → story.authorName  
+epub.getMetadata().getDescriptions().get(0) → story.summary
+epub.getCoverImage() → story.coverPath
+epub.getMetadata().getSubjects() → story.tags
+```
+
+### Content Extraction
+- **Multi-chapter EPUBs**: Combine all content files into single HTML
+- **Chapter separation**: Insert `<hr>` or `<h2>` tags between chapters
+- **HTML sanitization**: Apply existing sanitization rules
+- **Image handling**: Extract and store cover images, inline images optional
+
+### API Endpoints
+
+#### POST /api/stories/import-epub
+```java
+@PostMapping("/import-epub")
+public ResponseEntity<?> importEPUB(@RequestParam("file") MultipartFile file) {
+    // Implementation in EPUBImportService
+}
+```
+
+**Request**: Multipart file upload
+**Response**: 
+```json
+{
+  "message": "EPUB imported successfully",
+  "storyId": "uuid",
+  "extractedData": {
+    "title": "Story Title",
+    "author": "Author Name",
+    "summary": "Story description",
+    "chapterCount": 12,
+    "wordCount": 45000,
+    "hasCovers": true
+  }
+}
+```
+
+## EPUB Export Specification
+
+### Export Types
+1. **Single Story Export**: Convert one story to EPUB
+2. **Collection Export**: Multiple stories as single EPUB with chapters
+
+### EPUB Structure Generation
+```
+story.epub
+├── mimetype
+├── META-INF/
+│   └── container.xml
+└── OEBPS/
+    ├── content.opf          # Package metadata
+    ├── toc.ncx              # Navigation
+    ├── stylesheet.css       # Styling
+    ├── cover.html           # Cover page
+    ├── chapter001.xhtml     # Story content
+    ├── images/
+    │   └── cover.jpg        # Cover image
+    └── fonts/ (optional)
+```
+
+### Reading Position Implementation
+
+#### EPUB 3 CFI (Canonical Fragment Identifier)
+```xml
+<!-- In content.opf metadata -->
+<meta property="epub-cfi" content="/6/4[chap01]!/4[body01]/10[para05]/3:142"/>
+<meta property="reading-percentage" content="0.65"/>
+<meta property="last-read-timestamp" content="2023-12-07T10:30:00Z"/>
+```
+
+#### StoryCove Custom Metadata (Fallback)
+```xml
+<meta name="storycove:reading-chapter" content="3"/>
+<meta name="storycove:reading-paragraph" content="15"/>
+<meta name="storycove:reading-offset" content="142"/>
+<meta name="storycove:reading-percentage" content="0.65"/>
+```
+
+#### CFI Generation Logic
+```java
+public String generateCFI(ReadingPosition position) {
+    return String.format("/6/%d[chap%02d]!/4[body01]/%d[para%02d]/3:%d",
+        (position.getChapterIndex() * 2) + 4, 
+        position.getChapterIndex(),
+        (position.getParagraphIndex() * 2) + 4,
+        position.getParagraphIndex(),
+        position.getCharacterOffset());
+}
+```
+
+### API Endpoints
+
+#### GET /api/stories/{id}/export-epub
+```java
+@GetMapping("/{id}/export-epub")
+public ResponseEntity<StreamingResponseBody> exportStory(@PathVariable UUID id) {
+    // Implementation in EPUBExportService
+}
+```
+
+**Response**: EPUB file download with headers:
+```
+Content-Type: application/epub+zip
+Content-Disposition: attachment; filename="story-title.epub"
+```
+
+#### GET /api/collections/{id}/export-epub  
+```java
+@GetMapping("/{id}/export-epub")
+public ResponseEntity<StreamingResponseBody> exportCollection(@PathVariable UUID id) {
+    // Implementation in EPUBExportService
+}
+```
+
+**Response**: Multi-story EPUB with table of contents
+
+## Data Models
+
+### ReadingPosition Entity
+```java
+@Entity
+@Table(name = "reading_positions")
+public class ReadingPosition {
+    @Id
+    private UUID id;
+    
+    @ManyToOne(fetch = FetchType.LAZY)
+    @JoinColumn(name = "story_id")
+    private Story story;
+    
+    @Column(name = "chapter_index")
+    private Integer chapterIndex = 0;
+    
+    @Column(name = "paragraph_index") 
+    private Integer paragraphIndex = 0;
+    
+    @Column(name = "character_offset")
+    private Integer characterOffset = 0;
+    
+    @Column(name = "progress_percentage")
+    private Double progressPercentage = 0.0;
+    
+    @Column(name = "epub_cfi")
+    private String canonicalFragmentIdentifier;
+    
+    @Column(name = "last_read_at")
+    private LocalDateTime lastReadAt;
+    
+    @Column(name = "device_identifier")
+    private String deviceIdentifier;
+    
+    // Constructors, getters, setters
+}
+```
+
+### EPUB Import Request DTO
+```java
+public class EPUBImportRequest {
+    private String filename;
+    private Long fileSize;
+    private Boolean preserveChapterStructure = true;
+    private Boolean extractCover = true;
+    private String targetCollectionId; // Optional: add to specific collection
+}
+```
+
+### EPUB Export Options DTO
+```java
+public class EPUBExportOptions {
+    private Boolean includeReadingPosition = true;
+    private Boolean includeCoverImage = true;
+    private Boolean includeMetadata = true;
+    private String cssStylesheet; // Optional custom CSS
+    private EPUBVersion version = EPUBVersion.EPUB3;
+}
+```
+
+## Service Layer Architecture
+
+### EPUBImportService
+```java
+@Service
+public class EPUBImportService {
+    
+    // Core import method
+    public Story importEPUBFile(MultipartFile file, EPUBImportRequest request);
+    
+    // Helper methods
+    private void validateEPUBFile(MultipartFile file);
+    private Book parseEPUBStructure(InputStream inputStream);
+    private Story extractStoryData(Book epub);
+    private String combineChapterContent(Book epub);
+    private void extractAndSaveCover(Book epub, Story story);
+    private List<String> extractTags(Book epub);
+    private ReadingPosition extractReadingPosition(Book epub);
+}
+```
+
+### EPUBExportService
+```java
+@Service 
+public class EPUBExportService {
+    
+    // Core export methods
+    public byte[] exportSingleStory(UUID storyId, EPUBExportOptions options);
+    public byte[] exportCollection(UUID collectionId, EPUBExportOptions options);
+    
+    // Helper methods
+    private Book createEPUBStructure(Story story, ReadingPosition position);
+    private Book createCollectionEPUB(Collection collection, List<ReadingPosition> positions);
+    private void addReadingPositionMetadata(Book book, ReadingPosition position);
+    private String generateCFI(ReadingPosition position);
+    private Resource createChapterResource(Story story);
+    private Resource createStylesheetResource();
+    private void addCoverImage(Book book, Story story);
+}
+```
+
+## Frontend Integration
+
+### Import UI Flow
+1. **Upload Interface**: File input with EPUB validation
+2. **Progress Indicator**: Show parsing progress
+3. **Preview Screen**: Display extracted metadata for confirmation
+4. **Confirmation**: Allow editing of title, author, summary before saving
+5. **Success**: Redirect to created story
+
+### Export UI Flow
+1. **Export Button**: Available on story detail and collection pages
+2. **Options Modal**: Allow selection of export options
+3. **Progress Indicator**: Show EPUB generation progress  
+4. **Download**: Automatic file download on completion
+
+### Frontend API Calls
+```typescript
+// Import EPUB
+const importEPUB = async (file: File) => {
+  const formData = new FormData();
+  formData.append('file', file);
+  
+  const response = await fetch('/api/stories/import-epub', {
+    method: 'POST',
+    body: formData,
+  });
+  
+  return await response.json();
+};
+
+// Export Story
+const exportStoryEPUB = async (storyId: string) => {
+  const response = await fetch(`/api/stories/${storyId}/export-epub`, {
+    method: 'GET',
+  });
+  
+  const blob = await response.blob();
+  const url = window.URL.createObjectURL(blob);
+  const a = document.createElement('a');
+  a.href = url;
+  a.download = `${storyTitle}.epub`;
+  a.click();
+};
+```
+
+## Error Handling
+
+### Import Errors
+- **Invalid EPUB format**: "Invalid EPUB file format"
+- **File too large**: "File size exceeds 50MB limit"
+- **DRM protected**: "DRM-protected EPUBs not supported"
+- **Corrupted file**: "EPUB file appears to be corrupted"
+- **No content**: "EPUB contains no readable content"
+
+### Export Errors  
+- **Story not found**: "Story not found or access denied"
+- **Missing content**: "Story has no content to export"
+- **Generation failure**: "Failed to generate EPUB file"
+
+## Security Considerations
+
+### File Upload Security
+- **File type validation**: Verify EPUB MIME type and structure
+- **Size limits**: Enforce maximum file size limits
+- **Content sanitization**: Apply existing HTML sanitization
+- **Virus scanning**: Consider integration with antivirus scanning
+
+### Content Security
+- **HTML sanitization**: Apply existing Jsoup rules to imported content
+- **Image validation**: Validate extracted cover images
+- **Metadata escaping**: Escape special characters in metadata
+
+## Testing Strategy
+
+### Unit Tests
+- EPUB parsing and validation logic
+- CFI generation and parsing
+- Metadata extraction accuracy
+- Content sanitization
+
+### Integration Tests
+- End-to-end import/export workflow
+- Reading position preservation
+- Multi-story collection export
+- Error handling scenarios
+
+### Test Data
+- Sample EPUB files for various scenarios
+- EPUBs with and without reading positions
+- Multi-chapter EPUBs
+- EPUBs with covers and metadata
+
+## Performance Considerations
+
+### Import Performance
+- **Streaming processing**: Process large EPUBs without loading entirely into memory
+- **Async processing**: Consider async import for large files
+- **Progress tracking**: Provide progress feedback for large imports
+
+### Export Performance  
+- **Caching**: Cache generated EPUBs for repeated exports
+- **Streaming**: Stream EPUB generation for large collections
+- **Resource optimization**: Optimize image and content sizes
+
+## Future Enhancements (Out of Scope)
+
+### Phase 2 Considerations
+- **DRM support**: Research legal and technical feasibility
+- **Reading position sync**: Real-time sync across devices
+- **Advanced EPUB features**: Enhanced typography, annotations
+- **Bulk operations**: Import/export multiple EPUBs
+- **EPUB validation**: Full EPUB compliance checking
+
+### Integration Possibilities
+- **Cloud storage**: Export directly to Google Drive, Dropbox
+- **E-reader sync**: Direct sync with Kindle, Kobo devices
+- **Reading analytics**: Track reading patterns and statistics
+
+## Implementation Phases
+
+### Phase 1: Core Functionality ✅ **COMPLETED**
+- [x] Basic EPUB import (DRM-free)
+- [x] Single story export
+- [x] Reading position storage and retrieval
+- [x] Frontend UI integration
+
+### Phase 2: Enhanced Features  
+- [ ] Collection export
+- [ ] Advanced metadata handling
+- [ ] Performance optimizations
+- [ ] Comprehensive error handling
+
+### Phase 3: Advanced Features
+- [ ] DRM exploration (legal research required)
+- [ ] Reading position sync
+- [ ] Advanced EPUB features
+- [ ] Analytics and reporting
+
+## Acceptance Criteria
+
+### Import Success Criteria ✅ **COMPLETED**
+- [x] Successfully parse EPUB 2.0 and 3.x files
+- [x] Extract title, author, summary, and content accurately
+- [x] Preserve formatting and basic HTML structure
+- [x] Handle cover images correctly
+- [x] Import reading positions when present
+- [x] Provide clear error messages for invalid files
+
+### Export Success Criteria ✅ **PHASE 1 COMPLETED**
+- [x] Generate valid EPUB files compatible with major readers
+- [x] Include accurate metadata and content
+- [x] Embed reading positions using CFI standard
+- [x] Support single story export
+- [ ] Support collection export *(Phase 2)*
+- [ ] Generate proper table of contents for collections *(Phase 2)*
+- [x] Include cover images when available
+
+---
+
+*This specification serves as the implementation guide for the EPUB import/export feature. All implementation decisions should reference this document for consistency and completeness.*