459 lines
15 KiB
Markdown
459 lines
15 KiB
Markdown
# EPUB Import/Export Specification
|
|
|
|
## 🎉 Phase 1 Implementation Complete
|
|
|
|
**Status**: Phase 1 fully implemented and operational as of August 2025
|
|
|
|
**Key Achievements**:
|
|
- ✅ Complete EPUB import functionality with validation and error handling
|
|
- ✅ Single story EPUB export with XML validation fixes
|
|
- ✅ Reading position preservation using EPUB CFI standards
|
|
- ✅ Full frontend UI integration with navigation and authentication
|
|
- ✅ Moved export button to Story Detail View for better UX
|
|
- ✅ Added EPUB import to main Add Story menu dropdown
|
|
|
|
## Overview
|
|
|
|
This specification defines the requirements and implementation details for importing and exporting EPUB files in StoryCove. The feature enables users to import stories from EPUB files and export their stories/collections as EPUB files with preserved reading positions.
|
|
|
|
## Scope
|
|
|
|
### In Scope
|
|
- **EPUB Import**: Parse DRM-free EPUB files and import as stories
|
|
- **EPUB Export**: Export individual stories and collections as EPUB files
|
|
- **Reading Position Preservation**: Store and restore reading positions using EPUB standards
|
|
- **Metadata Handling**: Extract and preserve story metadata (title, author, cover, etc.)
|
|
- **Content Processing**: HTML content sanitization and formatting
|
|
|
|
### Out of Scope (Phase 1)
|
|
- DRM-protected EPUB files (future consideration)
|
|
- Real-time reading position sync between devices
|
|
- Advanced EPUB features (audio, video, interactive content)
|
|
- EPUB validation beyond basic structure
|
|
|
|
## Technical Architecture
|
|
|
|
### Backend Implementation
|
|
- **Language**: Java (Spring Boot)
|
|
- **Primary Library**: EPUBLib (nl.siegmann.epublib:epublib-core:3.1)
|
|
- **Processing**: Server-side generation and parsing
|
|
- **File Handling**: Multipart file upload for import, streaming download for export
|
|
|
|
### Dependencies
|
|
```xml
|
|
<dependency>
|
|
<groupId>com.positiondev.epublib</groupId>
|
|
<artifactId>epublib-core</artifactId>
|
|
<version>3.1</version>
|
|
</dependency>
|
|
```
|
|
|
|
### Phase 1 Implementation Notes
|
|
- **EPUBImportService**: Implemented with full validation, metadata extraction, and reading position handling
|
|
- **EPUBExportService**: Implemented with XML validation fixes for EPUB reader compatibility
|
|
- **ReadingPosition Entity**: Created with EPUB CFI support and database indexing
|
|
- **Authentication**: All endpoints secured with JWT authentication and proper frontend integration
|
|
- **UI Integration**: Export moved to Story Detail View, Import added to main navigation menu
|
|
- **XML Compliance**: Fixed XHTML validation issues by properly formatting self-closing tags (`<br>` → `<br />`)
|
|
|
|
## EPUB Import Specification
|
|
|
|
### Supported Formats
|
|
- **EPUB 2.0** and **EPUB 3.x** formats
|
|
- **DRM-Free** files only
|
|
- **Maximum file size**: 50MB
|
|
- **Supported content**: Text-based stories with HTML content
|
|
|
|
### Import Process Flow
|
|
1. **File Upload**: User uploads EPUB file via web interface
|
|
2. **Validation**: Check file format, size, and basic EPUB structure
|
|
3. **Parsing**: Extract metadata, content, and resources using EPUBLib
|
|
4. **Content Processing**: Sanitize HTML content using existing Jsoup pipeline
|
|
5. **Story Creation**: Create Story entity with extracted data
|
|
6. **Preview**: Show extracted story details for user confirmation
|
|
7. **Finalization**: Save story to database with imported metadata
|
|
|
|
### Metadata Mapping
|
|
```java
|
|
// EPUB Metadata → StoryCove Story Entity
|
|
epub.getMetadata().getFirstTitle() → story.title
|
|
epub.getMetadata().getAuthors().get(0) → story.authorName
|
|
epub.getMetadata().getDescriptions().get(0) → story.summary
|
|
epub.getCoverImage() → story.coverPath
|
|
epub.getMetadata().getSubjects() → story.tags
|
|
```
|
|
|
|
### Content Extraction
|
|
- **Multi-chapter EPUBs**: Combine all content files into single HTML
|
|
- **Chapter separation**: Insert `<hr>` or `<h2>` tags between chapters
|
|
- **HTML sanitization**: Apply existing sanitization rules
|
|
- **Image handling**: Extract and store cover images, inline images optional
|
|
|
|
### API Endpoints
|
|
|
|
#### POST /api/stories/import-epub
|
|
```java
|
|
@PostMapping("/import-epub")
|
|
public ResponseEntity<?> importEPUB(@RequestParam("file") MultipartFile file) {
|
|
// Implementation in EPUBImportService
|
|
}
|
|
```
|
|
|
|
**Request**: Multipart file upload
|
|
**Response**:
|
|
```json
|
|
{
|
|
"message": "EPUB imported successfully",
|
|
"storyId": "uuid",
|
|
"extractedData": {
|
|
"title": "Story Title",
|
|
"author": "Author Name",
|
|
"summary": "Story description",
|
|
"chapterCount": 12,
|
|
"wordCount": 45000,
|
|
"hasCovers": true
|
|
}
|
|
}
|
|
```
|
|
|
|
## EPUB Export Specification
|
|
|
|
### Export Types
|
|
1. **Single Story Export**: Convert one story to EPUB
|
|
2. **Collection Export**: Multiple stories as single EPUB with chapters
|
|
|
|
### EPUB Structure Generation
|
|
```
|
|
story.epub
|
|
├── mimetype
|
|
├── META-INF/
|
|
│ └── container.xml
|
|
└── OEBPS/
|
|
├── content.opf # Package metadata
|
|
├── toc.ncx # Navigation
|
|
├── stylesheet.css # Styling
|
|
├── cover.html # Cover page
|
|
├── chapter001.xhtml # Story content
|
|
├── images/
|
|
│ └── cover.jpg # Cover image
|
|
└── fonts/ (optional)
|
|
```
|
|
|
|
### Reading Position Implementation
|
|
|
|
#### EPUB 3 CFI (Canonical Fragment Identifier)
|
|
```xml
|
|
<!-- In content.opf metadata -->
|
|
<meta property="epub-cfi" content="/6/4[chap01]!/4[body01]/10[para05]/3:142"/>
|
|
<meta property="reading-percentage" content="0.65"/>
|
|
<meta property="last-read-timestamp" content="2023-12-07T10:30:00Z"/>
|
|
```
|
|
|
|
#### StoryCove Custom Metadata (Fallback)
|
|
```xml
|
|
<meta name="storycove:reading-chapter" content="3"/>
|
|
<meta name="storycove:reading-paragraph" content="15"/>
|
|
<meta name="storycove:reading-offset" content="142"/>
|
|
<meta name="storycove:reading-percentage" content="0.65"/>
|
|
```
|
|
|
|
#### CFI Generation Logic
|
|
```java
|
|
public String generateCFI(ReadingPosition position) {
|
|
return String.format("/6/%d[chap%02d]!/4[body01]/%d[para%02d]/3:%d",
|
|
(position.getChapterIndex() * 2) + 4,
|
|
position.getChapterIndex(),
|
|
(position.getParagraphIndex() * 2) + 4,
|
|
position.getParagraphIndex(),
|
|
position.getCharacterOffset());
|
|
}
|
|
```
|
|
|
|
### API Endpoints
|
|
|
|
#### GET /api/stories/{id}/export-epub
|
|
```java
|
|
@GetMapping("/{id}/export-epub")
|
|
public ResponseEntity<StreamingResponseBody> exportStory(@PathVariable UUID id) {
|
|
// Implementation in EPUBExportService
|
|
}
|
|
```
|
|
|
|
**Response**: EPUB file download with headers:
|
|
```
|
|
Content-Type: application/epub+zip
|
|
Content-Disposition: attachment; filename="story-title.epub"
|
|
```
|
|
|
|
#### GET /api/collections/{id}/export-epub
|
|
```java
|
|
@GetMapping("/{id}/export-epub")
|
|
public ResponseEntity<StreamingResponseBody> exportCollection(@PathVariable UUID id) {
|
|
// Implementation in EPUBExportService
|
|
}
|
|
```
|
|
|
|
**Response**: Multi-story EPUB with table of contents
|
|
|
|
## Data Models
|
|
|
|
### ReadingPosition Entity
|
|
```java
|
|
@Entity
|
|
@Table(name = "reading_positions")
|
|
public class ReadingPosition {
|
|
@Id
|
|
private UUID id;
|
|
|
|
@ManyToOne(fetch = FetchType.LAZY)
|
|
@JoinColumn(name = "story_id")
|
|
private Story story;
|
|
|
|
@Column(name = "chapter_index")
|
|
private Integer chapterIndex = 0;
|
|
|
|
@Column(name = "paragraph_index")
|
|
private Integer paragraphIndex = 0;
|
|
|
|
@Column(name = "character_offset")
|
|
private Integer characterOffset = 0;
|
|
|
|
@Column(name = "progress_percentage")
|
|
private Double progressPercentage = 0.0;
|
|
|
|
@Column(name = "epub_cfi")
|
|
private String canonicalFragmentIdentifier;
|
|
|
|
@Column(name = "last_read_at")
|
|
private LocalDateTime lastReadAt;
|
|
|
|
@Column(name = "device_identifier")
|
|
private String deviceIdentifier;
|
|
|
|
// Constructors, getters, setters
|
|
}
|
|
```
|
|
|
|
### EPUB Import Request DTO
|
|
```java
|
|
public class EPUBImportRequest {
|
|
private String filename;
|
|
private Long fileSize;
|
|
private Boolean preserveChapterStructure = true;
|
|
private Boolean extractCover = true;
|
|
private String targetCollectionId; // Optional: add to specific collection
|
|
}
|
|
```
|
|
|
|
### EPUB Export Options DTO
|
|
```java
|
|
public class EPUBExportOptions {
|
|
private Boolean includeReadingPosition = true;
|
|
private Boolean includeCoverImage = true;
|
|
private Boolean includeMetadata = true;
|
|
private String cssStylesheet; // Optional custom CSS
|
|
private EPUBVersion version = EPUBVersion.EPUB3;
|
|
}
|
|
```
|
|
|
|
## Service Layer Architecture
|
|
|
|
### EPUBImportService
|
|
```java
|
|
@Service
|
|
public class EPUBImportService {
|
|
|
|
// Core import method
|
|
public Story importEPUBFile(MultipartFile file, EPUBImportRequest request);
|
|
|
|
// Helper methods
|
|
private void validateEPUBFile(MultipartFile file);
|
|
private Book parseEPUBStructure(InputStream inputStream);
|
|
private Story extractStoryData(Book epub);
|
|
private String combineChapterContent(Book epub);
|
|
private void extractAndSaveCover(Book epub, Story story);
|
|
private List<String> extractTags(Book epub);
|
|
private ReadingPosition extractReadingPosition(Book epub);
|
|
}
|
|
```
|
|
|
|
### EPUBExportService
|
|
```java
|
|
@Service
|
|
public class EPUBExportService {
|
|
|
|
// Core export methods
|
|
public byte[] exportSingleStory(UUID storyId, EPUBExportOptions options);
|
|
public byte[] exportCollection(UUID collectionId, EPUBExportOptions options);
|
|
|
|
// Helper methods
|
|
private Book createEPUBStructure(Story story, ReadingPosition position);
|
|
private Book createCollectionEPUB(Collection collection, List<ReadingPosition> positions);
|
|
private void addReadingPositionMetadata(Book book, ReadingPosition position);
|
|
private String generateCFI(ReadingPosition position);
|
|
private Resource createChapterResource(Story story);
|
|
private Resource createStylesheetResource();
|
|
private void addCoverImage(Book book, Story story);
|
|
}
|
|
```
|
|
|
|
## Frontend Integration
|
|
|
|
### Import UI Flow
|
|
1. **Upload Interface**: File input with EPUB validation
|
|
2. **Progress Indicator**: Show parsing progress
|
|
3. **Preview Screen**: Display extracted metadata for confirmation
|
|
4. **Confirmation**: Allow editing of title, author, summary before saving
|
|
5. **Success**: Redirect to created story
|
|
|
|
### Export UI Flow
|
|
1. **Export Button**: Available on story detail and collection pages
|
|
2. **Options Modal**: Allow selection of export options
|
|
3. **Progress Indicator**: Show EPUB generation progress
|
|
4. **Download**: Automatic file download on completion
|
|
|
|
### Frontend API Calls
|
|
```typescript
|
|
// Import EPUB
|
|
const importEPUB = async (file: File) => {
|
|
const formData = new FormData();
|
|
formData.append('file', file);
|
|
|
|
const response = await fetch('/api/stories/import-epub', {
|
|
method: 'POST',
|
|
body: formData,
|
|
});
|
|
|
|
return await response.json();
|
|
};
|
|
|
|
// Export Story
|
|
const exportStoryEPUB = async (storyId: string) => {
|
|
const response = await fetch(`/api/stories/${storyId}/export-epub`, {
|
|
method: 'GET',
|
|
});
|
|
|
|
const blob = await response.blob();
|
|
const url = window.URL.createObjectURL(blob);
|
|
const a = document.createElement('a');
|
|
a.href = url;
|
|
a.download = `${storyTitle}.epub`;
|
|
a.click();
|
|
};
|
|
```
|
|
|
|
## Error Handling
|
|
|
|
### Import Errors
|
|
- **Invalid EPUB format**: "Invalid EPUB file format"
|
|
- **File too large**: "File size exceeds 50MB limit"
|
|
- **DRM protected**: "DRM-protected EPUBs not supported"
|
|
- **Corrupted file**: "EPUB file appears to be corrupted"
|
|
- **No content**: "EPUB contains no readable content"
|
|
|
|
### Export Errors
|
|
- **Story not found**: "Story not found or access denied"
|
|
- **Missing content**: "Story has no content to export"
|
|
- **Generation failure**: "Failed to generate EPUB file"
|
|
|
|
## Security Considerations
|
|
|
|
### File Upload Security
|
|
- **File type validation**: Verify EPUB MIME type and structure
|
|
- **Size limits**: Enforce maximum file size limits
|
|
- **Content sanitization**: Apply existing HTML sanitization
|
|
- **Virus scanning**: Consider integration with antivirus scanning
|
|
|
|
### Content Security
|
|
- **HTML sanitization**: Apply existing Jsoup rules to imported content
|
|
- **Image validation**: Validate extracted cover images
|
|
- **Metadata escaping**: Escape special characters in metadata
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- EPUB parsing and validation logic
|
|
- CFI generation and parsing
|
|
- Metadata extraction accuracy
|
|
- Content sanitization
|
|
|
|
### Integration Tests
|
|
- End-to-end import/export workflow
|
|
- Reading position preservation
|
|
- Multi-story collection export
|
|
- Error handling scenarios
|
|
|
|
### Test Data
|
|
- Sample EPUB files for various scenarios
|
|
- EPUBs with and without reading positions
|
|
- Multi-chapter EPUBs
|
|
- EPUBs with covers and metadata
|
|
|
|
## Performance Considerations
|
|
|
|
### Import Performance
|
|
- **Streaming processing**: Process large EPUBs without loading entirely into memory
|
|
- **Async processing**: Consider async import for large files
|
|
- **Progress tracking**: Provide progress feedback for large imports
|
|
|
|
### Export Performance
|
|
- **Caching**: Cache generated EPUBs for repeated exports
|
|
- **Streaming**: Stream EPUB generation for large collections
|
|
- **Resource optimization**: Optimize image and content sizes
|
|
|
|
## Future Enhancements (Out of Scope)
|
|
|
|
### Phase 2 Considerations
|
|
- **DRM support**: Research legal and technical feasibility
|
|
- **Reading position sync**: Real-time sync across devices
|
|
- **Advanced EPUB features**: Enhanced typography, annotations
|
|
- **Bulk operations**: Import/export multiple EPUBs
|
|
- **EPUB validation**: Full EPUB compliance checking
|
|
|
|
### Integration Possibilities
|
|
- **Cloud storage**: Export directly to Google Drive, Dropbox
|
|
- **E-reader sync**: Direct sync with Kindle, Kobo devices
|
|
- **Reading analytics**: Track reading patterns and statistics
|
|
|
|
## Implementation Phases
|
|
|
|
### Phase 1: Core Functionality ✅ **COMPLETED**
|
|
- [x] Basic EPUB import (DRM-free)
|
|
- [x] Single story export
|
|
- [x] Reading position storage and retrieval
|
|
- [x] Frontend UI integration
|
|
|
|
### Phase 2: Enhanced Features
|
|
- [ ] Collection export
|
|
- [ ] Advanced metadata handling
|
|
- [ ] Performance optimizations
|
|
- [ ] Comprehensive error handling
|
|
|
|
### Phase 3: Advanced Features
|
|
- [ ] DRM exploration (legal research required)
|
|
- [ ] Reading position sync
|
|
- [ ] Advanced EPUB features
|
|
- [ ] Analytics and reporting
|
|
|
|
## Acceptance Criteria
|
|
|
|
### Import Success Criteria ✅ **COMPLETED**
|
|
- [x] Successfully parse EPUB 2.0 and 3.x files
|
|
- [x] Extract title, author, summary, and content accurately
|
|
- [x] Preserve formatting and basic HTML structure
|
|
- [x] Handle cover images correctly
|
|
- [x] Import reading positions when present
|
|
- [x] Provide clear error messages for invalid files
|
|
|
|
### Export Success Criteria ✅ **PHASE 1 COMPLETED**
|
|
- [x] Generate valid EPUB files compatible with major readers
|
|
- [x] Include accurate metadata and content
|
|
- [x] Embed reading positions using CFI standard
|
|
- [x] Support single story export
|
|
- [ ] Support collection export *(Phase 2)*
|
|
- [ ] Generate proper table of contents for collections *(Phase 2)*
|
|
- [x] Include cover images when available
|
|
|
|
---
|
|
|
|
*This specification serves as the implementation guide for the EPUB import/export feature. All implementation decisions should reference this document for consistency and completeness.* |