15 KiB
15 KiB
EPUB Import/Export Specification
🎉 Phase 1 Implementation Complete
Status: Phase 1 fully implemented and operational as of August 2025
Key Achievements:
- ✅ Complete EPUB import functionality with validation and error handling
- ✅ Single story EPUB export with XML validation fixes
- ✅ Reading position preservation using EPUB CFI standards
- ✅ Full frontend UI integration with navigation and authentication
- ✅ Moved export button to Story Detail View for better UX
- ✅ Added EPUB import to main Add Story menu dropdown
Overview
This specification defines the requirements and implementation details for importing and exporting EPUB files in StoryCove. The feature enables users to import stories from EPUB files and export their stories/collections as EPUB files with preserved reading positions.
Scope
In Scope
- EPUB Import: Parse DRM-free EPUB files and import as stories
- EPUB Export: Export individual stories and collections as EPUB files
- Reading Position Preservation: Store and restore reading positions using EPUB standards
- Metadata Handling: Extract and preserve story metadata (title, author, cover, etc.)
- Content Processing: HTML content sanitization and formatting
Out of Scope (Phase 1)
- DRM-protected EPUB files (future consideration)
- Real-time reading position sync between devices
- Advanced EPUB features (audio, video, interactive content)
- EPUB validation beyond basic structure
Technical Architecture
Backend Implementation
- Language: Java (Spring Boot)
- Primary Library: EPUBLib (nl.siegmann.epublib:epublib-core:3.1)
- Processing: Server-side generation and parsing
- File Handling: Multipart file upload for import, streaming download for export
Dependencies
<dependency>
<groupId>com.positiondev.epublib</groupId>
<artifactId>epublib-core</artifactId>
<version>3.1</version>
</dependency>
Phase 1 Implementation Notes
- EPUBImportService: Implemented with full validation, metadata extraction, and reading position handling
- EPUBExportService: Implemented with XML validation fixes for EPUB reader compatibility
- ReadingPosition Entity: Created with EPUB CFI support and database indexing
- Authentication: All endpoints secured with JWT authentication and proper frontend integration
- UI Integration: Export moved to Story Detail View, Import added to main navigation menu
- XML Compliance: Fixed XHTML validation issues by properly formatting self-closing tags (
<br>→<br />)
EPUB Import Specification
Supported Formats
- EPUB 2.0 and EPUB 3.x formats
- DRM-Free files only
- Maximum file size: 50MB
- Supported content: Text-based stories with HTML content
Import Process Flow
- File Upload: User uploads EPUB file via web interface
- Validation: Check file format, size, and basic EPUB structure
- Parsing: Extract metadata, content, and resources using EPUBLib
- Content Processing: Sanitize HTML content using existing Jsoup pipeline
- Story Creation: Create Story entity with extracted data
- Preview: Show extracted story details for user confirmation
- Finalization: Save story to database with imported metadata
Metadata Mapping
// EPUB Metadata → StoryCove Story Entity
epub.getMetadata().getFirstTitle() → story.title
epub.getMetadata().getAuthors().get(0) → story.authorName
epub.getMetadata().getDescriptions().get(0) → story.summary
epub.getCoverImage() → story.coverPath
epub.getMetadata().getSubjects() → story.tags
Content Extraction
- Multi-chapter EPUBs: Combine all content files into single HTML
- Chapter separation: Insert
<hr>or<h2>tags between chapters - HTML sanitization: Apply existing sanitization rules
- Image handling: Extract and store cover images, inline images optional
API Endpoints
POST /api/stories/import-epub
@PostMapping("/import-epub")
public ResponseEntity<?> importEPUB(@RequestParam("file") MultipartFile file) {
// Implementation in EPUBImportService
}
Request: Multipart file upload Response:
{
"message": "EPUB imported successfully",
"storyId": "uuid",
"extractedData": {
"title": "Story Title",
"author": "Author Name",
"summary": "Story description",
"chapterCount": 12,
"wordCount": 45000,
"hasCovers": true
}
}
EPUB Export Specification
Export Types
- Single Story Export: Convert one story to EPUB
- Collection Export: Multiple stories as single EPUB with chapters
EPUB Structure Generation
story.epub
├── mimetype
├── META-INF/
│ └── container.xml
└── OEBPS/
├── content.opf # Package metadata
├── toc.ncx # Navigation
├── stylesheet.css # Styling
├── cover.html # Cover page
├── chapter001.xhtml # Story content
├── images/
│ └── cover.jpg # Cover image
└── fonts/ (optional)
Reading Position Implementation
EPUB 3 CFI (Canonical Fragment Identifier)
<!-- In content.opf metadata -->
<meta property="epub-cfi" content="/6/4[chap01]!/4[body01]/10[para05]/3:142"/>
<meta property="reading-percentage" content="0.65"/>
<meta property="last-read-timestamp" content="2023-12-07T10:30:00Z"/>
StoryCove Custom Metadata (Fallback)
<meta name="storycove:reading-chapter" content="3"/>
<meta name="storycove:reading-paragraph" content="15"/>
<meta name="storycove:reading-offset" content="142"/>
<meta name="storycove:reading-percentage" content="0.65"/>
CFI Generation Logic
public String generateCFI(ReadingPosition position) {
return String.format("/6/%d[chap%02d]!/4[body01]/%d[para%02d]/3:%d",
(position.getChapterIndex() * 2) + 4,
position.getChapterIndex(),
(position.getParagraphIndex() * 2) + 4,
position.getParagraphIndex(),
position.getCharacterOffset());
}
API Endpoints
GET /api/stories/{id}/export-epub
@GetMapping("/{id}/export-epub")
public ResponseEntity<StreamingResponseBody> exportStory(@PathVariable UUID id) {
// Implementation in EPUBExportService
}
Response: EPUB file download with headers:
Content-Type: application/epub+zip
Content-Disposition: attachment; filename="story-title.epub"
GET /api/collections/{id}/export-epub
@GetMapping("/{id}/export-epub")
public ResponseEntity<StreamingResponseBody> exportCollection(@PathVariable UUID id) {
// Implementation in EPUBExportService
}
Response: Multi-story EPUB with table of contents
Data Models
ReadingPosition Entity
@Entity
@Table(name = "reading_positions")
public class ReadingPosition {
@Id
private UUID id;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "story_id")
private Story story;
@Column(name = "chapter_index")
private Integer chapterIndex = 0;
@Column(name = "paragraph_index")
private Integer paragraphIndex = 0;
@Column(name = "character_offset")
private Integer characterOffset = 0;
@Column(name = "progress_percentage")
private Double progressPercentage = 0.0;
@Column(name = "epub_cfi")
private String canonicalFragmentIdentifier;
@Column(name = "last_read_at")
private LocalDateTime lastReadAt;
@Column(name = "device_identifier")
private String deviceIdentifier;
// Constructors, getters, setters
}
EPUB Import Request DTO
public class EPUBImportRequest {
private String filename;
private Long fileSize;
private Boolean preserveChapterStructure = true;
private Boolean extractCover = true;
private String targetCollectionId; // Optional: add to specific collection
}
EPUB Export Options DTO
public class EPUBExportOptions {
private Boolean includeReadingPosition = true;
private Boolean includeCoverImage = true;
private Boolean includeMetadata = true;
private String cssStylesheet; // Optional custom CSS
private EPUBVersion version = EPUBVersion.EPUB3;
}
Service Layer Architecture
EPUBImportService
@Service
public class EPUBImportService {
// Core import method
public Story importEPUBFile(MultipartFile file, EPUBImportRequest request);
// Helper methods
private void validateEPUBFile(MultipartFile file);
private Book parseEPUBStructure(InputStream inputStream);
private Story extractStoryData(Book epub);
private String combineChapterContent(Book epub);
private void extractAndSaveCover(Book epub, Story story);
private List<String> extractTags(Book epub);
private ReadingPosition extractReadingPosition(Book epub);
}
EPUBExportService
@Service
public class EPUBExportService {
// Core export methods
public byte[] exportSingleStory(UUID storyId, EPUBExportOptions options);
public byte[] exportCollection(UUID collectionId, EPUBExportOptions options);
// Helper methods
private Book createEPUBStructure(Story story, ReadingPosition position);
private Book createCollectionEPUB(Collection collection, List<ReadingPosition> positions);
private void addReadingPositionMetadata(Book book, ReadingPosition position);
private String generateCFI(ReadingPosition position);
private Resource createChapterResource(Story story);
private Resource createStylesheetResource();
private void addCoverImage(Book book, Story story);
}
Frontend Integration
Import UI Flow
- Upload Interface: File input with EPUB validation
- Progress Indicator: Show parsing progress
- Preview Screen: Display extracted metadata for confirmation
- Confirmation: Allow editing of title, author, summary before saving
- Success: Redirect to created story
Export UI Flow
- Export Button: Available on story detail and collection pages
- Options Modal: Allow selection of export options
- Progress Indicator: Show EPUB generation progress
- Download: Automatic file download on completion
Frontend API Calls
// Import EPUB
const importEPUB = async (file: File) => {
const formData = new FormData();
formData.append('file', file);
const response = await fetch('/api/stories/import-epub', {
method: 'POST',
body: formData,
});
return await response.json();
};
// Export Story
const exportStoryEPUB = async (storyId: string) => {
const response = await fetch(`/api/stories/${storyId}/export-epub`, {
method: 'GET',
});
const blob = await response.blob();
const url = window.URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `${storyTitle}.epub`;
a.click();
};
Error Handling
Import Errors
- Invalid EPUB format: "Invalid EPUB file format"
- File too large: "File size exceeds 50MB limit"
- DRM protected: "DRM-protected EPUBs not supported"
- Corrupted file: "EPUB file appears to be corrupted"
- No content: "EPUB contains no readable content"
Export Errors
- Story not found: "Story not found or access denied"
- Missing content: "Story has no content to export"
- Generation failure: "Failed to generate EPUB file"
Security Considerations
File Upload Security
- File type validation: Verify EPUB MIME type and structure
- Size limits: Enforce maximum file size limits
- Content sanitization: Apply existing HTML sanitization
- Virus scanning: Consider integration with antivirus scanning
Content Security
- HTML sanitization: Apply existing Jsoup rules to imported content
- Image validation: Validate extracted cover images
- Metadata escaping: Escape special characters in metadata
Testing Strategy
Unit Tests
- EPUB parsing and validation logic
- CFI generation and parsing
- Metadata extraction accuracy
- Content sanitization
Integration Tests
- End-to-end import/export workflow
- Reading position preservation
- Multi-story collection export
- Error handling scenarios
Test Data
- Sample EPUB files for various scenarios
- EPUBs with and without reading positions
- Multi-chapter EPUBs
- EPUBs with covers and metadata
Performance Considerations
Import Performance
- Streaming processing: Process large EPUBs without loading entirely into memory
- Async processing: Consider async import for large files
- Progress tracking: Provide progress feedback for large imports
Export Performance
- Caching: Cache generated EPUBs for repeated exports
- Streaming: Stream EPUB generation for large collections
- Resource optimization: Optimize image and content sizes
Future Enhancements (Out of Scope)
Phase 2 Considerations
- DRM support: Research legal and technical feasibility
- Reading position sync: Real-time sync across devices
- Advanced EPUB features: Enhanced typography, annotations
- Bulk operations: Import/export multiple EPUBs
- EPUB validation: Full EPUB compliance checking
Integration Possibilities
- Cloud storage: Export directly to Google Drive, Dropbox
- E-reader sync: Direct sync with Kindle, Kobo devices
- Reading analytics: Track reading patterns and statistics
Implementation Phases
Phase 1: Core Functionality ✅ COMPLETED
- Basic EPUB import (DRM-free)
- Single story export
- Reading position storage and retrieval
- Frontend UI integration
Phase 2: Enhanced Features
- Collection export
- Advanced metadata handling
- Performance optimizations
- Comprehensive error handling
Phase 3: Advanced Features
- DRM exploration (legal research required)
- Reading position sync
- Advanced EPUB features
- Analytics and reporting
Acceptance Criteria
Import Success Criteria ✅ COMPLETED
- Successfully parse EPUB 2.0 and 3.x files
- Extract title, author, summary, and content accurately
- Preserve formatting and basic HTML structure
- Handle cover images correctly
- Import reading positions when present
- Provide clear error messages for invalid files
Export Success Criteria ✅ PHASE 1 COMPLETED
- Generate valid EPUB files compatible with major readers
- Include accurate metadata and content
- Embed reading positions using CFI standard
- Support single story export
- Support collection export (Phase 2)
- Generate proper table of contents for collections (Phase 2)
- Include cover images when available
This specification serves as the implementation guide for the EPUB import/export feature. All implementation decisions should reference this document for consistency and completeness.