Various improvements & Epub support
This commit is contained in:
459
EPUB_IMPORT_EXPORT_SPECIFICATION.md
Normal file
459
EPUB_IMPORT_EXPORT_SPECIFICATION.md
Normal file
@@ -0,0 +1,459 @@
|
||||
# EPUB Import/Export Specification
|
||||
|
||||
## 🎉 Phase 1 Implementation Complete
|
||||
|
||||
**Status**: Phase 1 fully implemented and operational as of August 2025
|
||||
|
||||
**Key Achievements**:
|
||||
- ✅ Complete EPUB import functionality with validation and error handling
|
||||
- ✅ Single story EPUB export with XML validation fixes
|
||||
- ✅ Reading position preservation using EPUB CFI standards
|
||||
- ✅ Full frontend UI integration with navigation and authentication
|
||||
- ✅ Moved export button to Story Detail View for better UX
|
||||
- ✅ Added EPUB import to main Add Story menu dropdown
|
||||
|
||||
## Overview
|
||||
|
||||
This specification defines the requirements and implementation details for importing and exporting EPUB files in StoryCove. The feature enables users to import stories from EPUB files and export their stories/collections as EPUB files with preserved reading positions.
|
||||
|
||||
## Scope
|
||||
|
||||
### In Scope
|
||||
- **EPUB Import**: Parse DRM-free EPUB files and import as stories
|
||||
- **EPUB Export**: Export individual stories and collections as EPUB files
|
||||
- **Reading Position Preservation**: Store and restore reading positions using EPUB standards
|
||||
- **Metadata Handling**: Extract and preserve story metadata (title, author, cover, etc.)
|
||||
- **Content Processing**: HTML content sanitization and formatting
|
||||
|
||||
### Out of Scope (Phase 1)
|
||||
- DRM-protected EPUB files (future consideration)
|
||||
- Real-time reading position sync between devices
|
||||
- Advanced EPUB features (audio, video, interactive content)
|
||||
- EPUB validation beyond basic structure
|
||||
|
||||
## Technical Architecture
|
||||
|
||||
### Backend Implementation
|
||||
- **Language**: Java (Spring Boot)
|
||||
- **Primary Library**: EPUBLib (nl.siegmann.epublib:epublib-core:3.1)
|
||||
- **Processing**: Server-side generation and parsing
|
||||
- **File Handling**: Multipart file upload for import, streaming download for export
|
||||
|
||||
### Dependencies
|
||||
```xml
|
||||
<dependency>
|
||||
<groupId>com.positiondev.epublib</groupId>
|
||||
<artifactId>epublib-core</artifactId>
|
||||
<version>3.1</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
### Phase 1 Implementation Notes
|
||||
- **EPUBImportService**: Implemented with full validation, metadata extraction, and reading position handling
|
||||
- **EPUBExportService**: Implemented with XML validation fixes for EPUB reader compatibility
|
||||
- **ReadingPosition Entity**: Created with EPUB CFI support and database indexing
|
||||
- **Authentication**: All endpoints secured with JWT authentication and proper frontend integration
|
||||
- **UI Integration**: Export moved to Story Detail View, Import added to main navigation menu
|
||||
- **XML Compliance**: Fixed XHTML validation issues by properly formatting self-closing tags (`<br>` → `<br />`)
|
||||
|
||||
## EPUB Import Specification
|
||||
|
||||
### Supported Formats
|
||||
- **EPUB 2.0** and **EPUB 3.x** formats
|
||||
- **DRM-Free** files only
|
||||
- **Maximum file size**: 50MB
|
||||
- **Supported content**: Text-based stories with HTML content
|
||||
|
||||
### Import Process Flow
|
||||
1. **File Upload**: User uploads EPUB file via web interface
|
||||
2. **Validation**: Check file format, size, and basic EPUB structure
|
||||
3. **Parsing**: Extract metadata, content, and resources using EPUBLib
|
||||
4. **Content Processing**: Sanitize HTML content using existing Jsoup pipeline
|
||||
5. **Story Creation**: Create Story entity with extracted data
|
||||
6. **Preview**: Show extracted story details for user confirmation
|
||||
7. **Finalization**: Save story to database with imported metadata
|
||||
|
||||
### Metadata Mapping
|
||||
```java
|
||||
// EPUB Metadata → StoryCove Story Entity
|
||||
epub.getMetadata().getFirstTitle() → story.title
|
||||
epub.getMetadata().getAuthors().get(0) → story.authorName
|
||||
epub.getMetadata().getDescriptions().get(0) → story.summary
|
||||
epub.getCoverImage() → story.coverPath
|
||||
epub.getMetadata().getSubjects() → story.tags
|
||||
```
|
||||
|
||||
### Content Extraction
|
||||
- **Multi-chapter EPUBs**: Combine all content files into single HTML
|
||||
- **Chapter separation**: Insert `<hr>` or `<h2>` tags between chapters
|
||||
- **HTML sanitization**: Apply existing sanitization rules
|
||||
- **Image handling**: Extract and store cover images, inline images optional
|
||||
|
||||
### API Endpoints
|
||||
|
||||
#### POST /api/stories/import-epub
|
||||
```java
|
||||
@PostMapping("/import-epub")
|
||||
public ResponseEntity<?> importEPUB(@RequestParam("file") MultipartFile file) {
|
||||
// Implementation in EPUBImportService
|
||||
}
|
||||
```
|
||||
|
||||
**Request**: Multipart file upload
|
||||
**Response**:
|
||||
```json
|
||||
{
|
||||
"message": "EPUB imported successfully",
|
||||
"storyId": "uuid",
|
||||
"extractedData": {
|
||||
"title": "Story Title",
|
||||
"author": "Author Name",
|
||||
"summary": "Story description",
|
||||
"chapterCount": 12,
|
||||
"wordCount": 45000,
|
||||
"hasCovers": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## EPUB Export Specification
|
||||
|
||||
### Export Types
|
||||
1. **Single Story Export**: Convert one story to EPUB
|
||||
2. **Collection Export**: Multiple stories as single EPUB with chapters
|
||||
|
||||
### EPUB Structure Generation
|
||||
```
|
||||
story.epub
|
||||
├── mimetype
|
||||
├── META-INF/
|
||||
│ └── container.xml
|
||||
└── OEBPS/
|
||||
├── content.opf # Package metadata
|
||||
├── toc.ncx # Navigation
|
||||
├── stylesheet.css # Styling
|
||||
├── cover.html # Cover page
|
||||
├── chapter001.xhtml # Story content
|
||||
├── images/
|
||||
│ └── cover.jpg # Cover image
|
||||
└── fonts/ (optional)
|
||||
```
|
||||
|
||||
### Reading Position Implementation
|
||||
|
||||
#### EPUB 3 CFI (Canonical Fragment Identifier)
|
||||
```xml
|
||||
<!-- In content.opf metadata -->
|
||||
<meta property="epub-cfi" content="/6/4[chap01]!/4[body01]/10[para05]/3:142"/>
|
||||
<meta property="reading-percentage" content="0.65"/>
|
||||
<meta property="last-read-timestamp" content="2023-12-07T10:30:00Z"/>
|
||||
```
|
||||
|
||||
#### StoryCove Custom Metadata (Fallback)
|
||||
```xml
|
||||
<meta name="storycove:reading-chapter" content="3"/>
|
||||
<meta name="storycove:reading-paragraph" content="15"/>
|
||||
<meta name="storycove:reading-offset" content="142"/>
|
||||
<meta name="storycove:reading-percentage" content="0.65"/>
|
||||
```
|
||||
|
||||
#### CFI Generation Logic
|
||||
```java
|
||||
public String generateCFI(ReadingPosition position) {
|
||||
return String.format("/6/%d[chap%02d]!/4[body01]/%d[para%02d]/3:%d",
|
||||
(position.getChapterIndex() * 2) + 4,
|
||||
position.getChapterIndex(),
|
||||
(position.getParagraphIndex() * 2) + 4,
|
||||
position.getParagraphIndex(),
|
||||
position.getCharacterOffset());
|
||||
}
|
||||
```
|
||||
|
||||
### API Endpoints
|
||||
|
||||
#### GET /api/stories/{id}/export-epub
|
||||
```java
|
||||
@GetMapping("/{id}/export-epub")
|
||||
public ResponseEntity<StreamingResponseBody> exportStory(@PathVariable UUID id) {
|
||||
// Implementation in EPUBExportService
|
||||
}
|
||||
```
|
||||
|
||||
**Response**: EPUB file download with headers:
|
||||
```
|
||||
Content-Type: application/epub+zip
|
||||
Content-Disposition: attachment; filename="story-title.epub"
|
||||
```
|
||||
|
||||
#### GET /api/collections/{id}/export-epub
|
||||
```java
|
||||
@GetMapping("/{id}/export-epub")
|
||||
public ResponseEntity<StreamingResponseBody> exportCollection(@PathVariable UUID id) {
|
||||
// Implementation in EPUBExportService
|
||||
}
|
||||
```
|
||||
|
||||
**Response**: Multi-story EPUB with table of contents
|
||||
|
||||
## Data Models
|
||||
|
||||
### ReadingPosition Entity
|
||||
```java
|
||||
@Entity
|
||||
@Table(name = "reading_positions")
|
||||
public class ReadingPosition {
|
||||
@Id
|
||||
private UUID id;
|
||||
|
||||
@ManyToOne(fetch = FetchType.LAZY)
|
||||
@JoinColumn(name = "story_id")
|
||||
private Story story;
|
||||
|
||||
@Column(name = "chapter_index")
|
||||
private Integer chapterIndex = 0;
|
||||
|
||||
@Column(name = "paragraph_index")
|
||||
private Integer paragraphIndex = 0;
|
||||
|
||||
@Column(name = "character_offset")
|
||||
private Integer characterOffset = 0;
|
||||
|
||||
@Column(name = "progress_percentage")
|
||||
private Double progressPercentage = 0.0;
|
||||
|
||||
@Column(name = "epub_cfi")
|
||||
private String canonicalFragmentIdentifier;
|
||||
|
||||
@Column(name = "last_read_at")
|
||||
private LocalDateTime lastReadAt;
|
||||
|
||||
@Column(name = "device_identifier")
|
||||
private String deviceIdentifier;
|
||||
|
||||
// Constructors, getters, setters
|
||||
}
|
||||
```
|
||||
|
||||
### EPUB Import Request DTO
|
||||
```java
|
||||
public class EPUBImportRequest {
|
||||
private String filename;
|
||||
private Long fileSize;
|
||||
private Boolean preserveChapterStructure = true;
|
||||
private Boolean extractCover = true;
|
||||
private String targetCollectionId; // Optional: add to specific collection
|
||||
}
|
||||
```
|
||||
|
||||
### EPUB Export Options DTO
|
||||
```java
|
||||
public class EPUBExportOptions {
|
||||
private Boolean includeReadingPosition = true;
|
||||
private Boolean includeCoverImage = true;
|
||||
private Boolean includeMetadata = true;
|
||||
private String cssStylesheet; // Optional custom CSS
|
||||
private EPUBVersion version = EPUBVersion.EPUB3;
|
||||
}
|
||||
```
|
||||
|
||||
## Service Layer Architecture
|
||||
|
||||
### EPUBImportService
|
||||
```java
|
||||
@Service
|
||||
public class EPUBImportService {
|
||||
|
||||
// Core import method
|
||||
public Story importEPUBFile(MultipartFile file, EPUBImportRequest request);
|
||||
|
||||
// Helper methods
|
||||
private void validateEPUBFile(MultipartFile file);
|
||||
private Book parseEPUBStructure(InputStream inputStream);
|
||||
private Story extractStoryData(Book epub);
|
||||
private String combineChapterContent(Book epub);
|
||||
private void extractAndSaveCover(Book epub, Story story);
|
||||
private List<String> extractTags(Book epub);
|
||||
private ReadingPosition extractReadingPosition(Book epub);
|
||||
}
|
||||
```
|
||||
|
||||
### EPUBExportService
|
||||
```java
|
||||
@Service
|
||||
public class EPUBExportService {
|
||||
|
||||
// Core export methods
|
||||
public byte[] exportSingleStory(UUID storyId, EPUBExportOptions options);
|
||||
public byte[] exportCollection(UUID collectionId, EPUBExportOptions options);
|
||||
|
||||
// Helper methods
|
||||
private Book createEPUBStructure(Story story, ReadingPosition position);
|
||||
private Book createCollectionEPUB(Collection collection, List<ReadingPosition> positions);
|
||||
private void addReadingPositionMetadata(Book book, ReadingPosition position);
|
||||
private String generateCFI(ReadingPosition position);
|
||||
private Resource createChapterResource(Story story);
|
||||
private Resource createStylesheetResource();
|
||||
private void addCoverImage(Book book, Story story);
|
||||
}
|
||||
```
|
||||
|
||||
## Frontend Integration
|
||||
|
||||
### Import UI Flow
|
||||
1. **Upload Interface**: File input with EPUB validation
|
||||
2. **Progress Indicator**: Show parsing progress
|
||||
3. **Preview Screen**: Display extracted metadata for confirmation
|
||||
4. **Confirmation**: Allow editing of title, author, summary before saving
|
||||
5. **Success**: Redirect to created story
|
||||
|
||||
### Export UI Flow
|
||||
1. **Export Button**: Available on story detail and collection pages
|
||||
2. **Options Modal**: Allow selection of export options
|
||||
3. **Progress Indicator**: Show EPUB generation progress
|
||||
4. **Download**: Automatic file download on completion
|
||||
|
||||
### Frontend API Calls
|
||||
```typescript
|
||||
// Import EPUB
|
||||
const importEPUB = async (file: File) => {
|
||||
const formData = new FormData();
|
||||
formData.append('file', file);
|
||||
|
||||
const response = await fetch('/api/stories/import-epub', {
|
||||
method: 'POST',
|
||||
body: formData,
|
||||
});
|
||||
|
||||
return await response.json();
|
||||
};
|
||||
|
||||
// Export Story
|
||||
const exportStoryEPUB = async (storyId: string) => {
|
||||
const response = await fetch(`/api/stories/${storyId}/export-epub`, {
|
||||
method: 'GET',
|
||||
});
|
||||
|
||||
const blob = await response.blob();
|
||||
const url = window.URL.createObjectURL(blob);
|
||||
const a = document.createElement('a');
|
||||
a.href = url;
|
||||
a.download = `${storyTitle}.epub`;
|
||||
a.click();
|
||||
};
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Import Errors
|
||||
- **Invalid EPUB format**: "Invalid EPUB file format"
|
||||
- **File too large**: "File size exceeds 50MB limit"
|
||||
- **DRM protected**: "DRM-protected EPUBs not supported"
|
||||
- **Corrupted file**: "EPUB file appears to be corrupted"
|
||||
- **No content**: "EPUB contains no readable content"
|
||||
|
||||
### Export Errors
|
||||
- **Story not found**: "Story not found or access denied"
|
||||
- **Missing content**: "Story has no content to export"
|
||||
- **Generation failure**: "Failed to generate EPUB file"
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### File Upload Security
|
||||
- **File type validation**: Verify EPUB MIME type and structure
|
||||
- **Size limits**: Enforce maximum file size limits
|
||||
- **Content sanitization**: Apply existing HTML sanitization
|
||||
- **Virus scanning**: Consider integration with antivirus scanning
|
||||
|
||||
### Content Security
|
||||
- **HTML sanitization**: Apply existing Jsoup rules to imported content
|
||||
- **Image validation**: Validate extracted cover images
|
||||
- **Metadata escaping**: Escape special characters in metadata
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
### Unit Tests
|
||||
- EPUB parsing and validation logic
|
||||
- CFI generation and parsing
|
||||
- Metadata extraction accuracy
|
||||
- Content sanitization
|
||||
|
||||
### Integration Tests
|
||||
- End-to-end import/export workflow
|
||||
- Reading position preservation
|
||||
- Multi-story collection export
|
||||
- Error handling scenarios
|
||||
|
||||
### Test Data
|
||||
- Sample EPUB files for various scenarios
|
||||
- EPUBs with and without reading positions
|
||||
- Multi-chapter EPUBs
|
||||
- EPUBs with covers and metadata
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Import Performance
|
||||
- **Streaming processing**: Process large EPUBs without loading entirely into memory
|
||||
- **Async processing**: Consider async import for large files
|
||||
- **Progress tracking**: Provide progress feedback for large imports
|
||||
|
||||
### Export Performance
|
||||
- **Caching**: Cache generated EPUBs for repeated exports
|
||||
- **Streaming**: Stream EPUB generation for large collections
|
||||
- **Resource optimization**: Optimize image and content sizes
|
||||
|
||||
## Future Enhancements (Out of Scope)
|
||||
|
||||
### Phase 2 Considerations
|
||||
- **DRM support**: Research legal and technical feasibility
|
||||
- **Reading position sync**: Real-time sync across devices
|
||||
- **Advanced EPUB features**: Enhanced typography, annotations
|
||||
- **Bulk operations**: Import/export multiple EPUBs
|
||||
- **EPUB validation**: Full EPUB compliance checking
|
||||
|
||||
### Integration Possibilities
|
||||
- **Cloud storage**: Export directly to Google Drive, Dropbox
|
||||
- **E-reader sync**: Direct sync with Kindle, Kobo devices
|
||||
- **Reading analytics**: Track reading patterns and statistics
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1: Core Functionality ✅ **COMPLETED**
|
||||
- [x] Basic EPUB import (DRM-free)
|
||||
- [x] Single story export
|
||||
- [x] Reading position storage and retrieval
|
||||
- [x] Frontend UI integration
|
||||
|
||||
### Phase 2: Enhanced Features
|
||||
- [ ] Collection export
|
||||
- [ ] Advanced metadata handling
|
||||
- [ ] Performance optimizations
|
||||
- [ ] Comprehensive error handling
|
||||
|
||||
### Phase 3: Advanced Features
|
||||
- [ ] DRM exploration (legal research required)
|
||||
- [ ] Reading position sync
|
||||
- [ ] Advanced EPUB features
|
||||
- [ ] Analytics and reporting
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
### Import Success Criteria ✅ **COMPLETED**
|
||||
- [x] Successfully parse EPUB 2.0 and 3.x files
|
||||
- [x] Extract title, author, summary, and content accurately
|
||||
- [x] Preserve formatting and basic HTML structure
|
||||
- [x] Handle cover images correctly
|
||||
- [x] Import reading positions when present
|
||||
- [x] Provide clear error messages for invalid files
|
||||
|
||||
### Export Success Criteria ✅ **PHASE 1 COMPLETED**
|
||||
- [x] Generate valid EPUB files compatible with major readers
|
||||
- [x] Include accurate metadata and content
|
||||
- [x] Embed reading positions using CFI standard
|
||||
- [x] Support single story export
|
||||
- [ ] Support collection export *(Phase 2)*
|
||||
- [ ] Generate proper table of contents for collections *(Phase 2)*
|
||||
- [x] Include cover images when available
|
||||
|
||||
---
|
||||
|
||||
*This specification serves as the implementation guide for the EPUB import/export feature. All implementation decisions should reference this document for consistency and completeness.*
|
||||
Reference in New Issue
Block a user