22 KiB
MemoHanzi - Implementation Specification
Version: 1.0
Status: Ready for Implementation
Target: Claude Code
Application Name: MemoHanzi (记汉字 - "Remember Hanzi")
Quick Start Summary
What: MemoHanzi is a self-hosted web app for learning Chinese characters (hanzi) using spaced repetition (SM-2 algorithm)
Tech Stack:
- Next.js 16 (TypeScript, App Router, Server Actions)
- PostgreSQL 18 + Prisma ORM
- NextAuth.js v5 for authentication
- Docker Compose deployment with Nginx reverse proxy
- Tailwind CSS, React Hook Form, Zod validation, Recharts
MVP Timeline: 10-12 weeks
1. Core Features (MVP)
User Features
- ✅ Registration/Login with email & password
- ✅ Create and manage personal hanzi collections
- ✅ Browse and use global HSK-level collections
- ✅ Learning sessions with 4-choice pinyin quiz
- ✅ SM-2 spaced repetition algorithm
- ✅ Progress tracking & statistics dashboard
- ✅ Search hanzi database (by character, pinyin, meaning)
- ✅ User preferences (language, display options, learning settings)
Admin Features
- ✅ Import hanzi data (JSON/CSV from HSK vocabulary source)
- ✅ Manage global collections
- ✅ User management (roles, activation)
2. System Architecture
Deployment Stack
[Nginx Reverse Proxy:80/443]
↓ HTTPS/Rate Limiting/Caching
[Next.js App:3000]
↓ Prisma ORM
[PostgreSQL:5432]
Project Structure
memohanzi/
├── src/
│ ├── app/ # Next.js App Router
│ │ ├── (auth)/ # Login, register
│ │ ├── (app)/ # Dashboard, learn, collections, hanzi, progress, settings
│ │ └── (admin)/ # Admin pages
│ ├── actions/ # Server Actions (auth, collections, hanzi, learning, etc.)
│ ├── components/ # React components
│ ├── lib/ # Utils (SM-2 algorithm, parsers, validation)
│ └── types/ # TypeScript types
├── prisma/
│ └── schema.prisma # Database schema
├── docker/
│ ├── Dockerfile
│ └── nginx.conf
└── docker-compose.yml
3. Database Schema (Prisma)
Core Models
Language - Stores supported translation languages
- Fields: code (ISO 639-1), name, nativeName, isActive
Hanzi - Base hanzi information
- Fields: simplified (unique), radical, frequency
- Relations: forms, hskLevels, partsOfSpeech, userProgress, collectionItems
HanziForm - Traditional variants
- Fields: hanziId, traditional, isDefault
- Relations: transcriptions, meanings, classifiers
HanziTranscription - Multiple transcription types
- Fields: formId, type (pinyin/numeric/wadegiles/etc), value
HanziMeaning - Multi-language meanings
- Fields: formId, languageId, meaning, orderIndex
HanziHSKLevel - HSK level tags
- Fields: hanziId, level (e.g., "new-1", "old-3")
HanziPOS - Parts of speech
- Fields: hanziId, pos (n/v/adj/etc)
HanziClassifier - Measure words
- Fields: formId, classifier
User & Auth Models
User
- Fields: email, password (hashed), name, role (USER/ADMIN/MODERATOR), isActive
- Relations: collections, hanziProgress, preferences, sessions
UserPreference
- Fields: preferredLanguageId, characterDisplay (SIMPLIFIED/TRADITIONAL/BOTH), transcriptionType, cardsPerSession, dailyGoal, removalThreshold, allowManualDifficulty
Account, Session, VerificationToken - NextAuth.js standard models
Learning Models
Collection
- Fields: name, description, isGlobal, createdBy, isPublic
- Relations: items (CollectionItem join table)
CollectionItem - Join table
- Fields: collectionId, hanziId, orderIndex
UserHanziProgress - Tracks learning per hanzi
- Fields: userId, hanziId, correctCount, incorrectCount, consecutiveCorrect
- SM-2 fields: easeFactor (default 2.5), interval (default 1), nextReviewDate
- Manual override: manualDifficulty (EASY/MEDIUM/HARD/SUSPENDED)
LearningSession - Track study sessions
- Fields: userId, startedAt, endedAt, cardsReviewed, correctAnswers, incorrectAnswers, collectionId
- Relations: reviews (SessionReview)
SessionReview - Individual card reviews
- Fields: sessionId, hanziId, isCorrect, responseTime
4. Server Actions API
All actions return: { success: boolean, data?: T, message?: string, errors?: Record<string, string[]> }
Authentication (src/actions/auth.ts)
register(email, password, name)- Create accountlogin(email, password)- Authenticatelogout()- End sessionupdatePassword(current, new)- Change passwordupdateProfile(name, email, image)- Update user
Collections (src/actions/collections.ts)
createCollection(name, description, isPublic)- New collectionupdateCollection(id, data)- Modify (owner/admin only)deleteCollection(id)- Remove (owner/admin only)getCollection(id)- Get with hanzigetUserCollections()- List user's collectionsgetGlobalCollections()- List HSK collectionsaddHanziToCollection(collectionId, hanziIds[])- Add hanziremoveHanziFromCollection(collectionId, hanziId)- Remove hanzi
Hanzi (src/actions/hanzi.ts)
searchHanzi(query, hskLevel?, limit, offset)- Search database (public)getHanzi(id)- Get details (public)getHanziBySimplified(char)- Lookup by character (public)
Learning (src/actions/learning.ts)
startLearningSession(collectionId?, cardsCount)- Begin session, returns cardssubmitAnswer(sessionId, hanziId, selected, correct, time)- Record answer, updates SM-2endSession(sessionId)- Complete, return summarygetDueCards()- Get counts (now, today, week)updateCardDifficulty(hanziId, difficulty)- Manual overrideremoveFromLearning(hanziId)- Stop learning card
Progress (src/actions/progress.ts)
getUserProgress(dateRange?)- Overall stats & chartsgetHanziProgress(hanziId)- Individual hanzi statsgetLearningSessions(limit?)- Session historygetStatistics()- Dashboard statsresetHanziProgress(hanziId)- Reset card
Preferences (src/actions/preferences.ts)
getPreferences()- Get settingsupdatePreferences(data)- Update settingsgetAvailableLanguages()- List languages
Admin (src/actions/admin.ts)
createGlobalCollection(name, description, hskLevel)- HSK collectionimportHanzi(fileData, format)- Bulk import (JSON/CSV)getImportHistory()- Past importsgetUserManagement(page, pageSize)- List usersupdateUserRole(userId, role)- Change roletoggleUserStatus(userId)- Activate/deactivate
5. SM-2 Algorithm Implementation
Initial Values
- easeFactor: 2.5
- interval: 1 day
- consecutiveCorrect: 0
On Correct Answer
if (consecutiveCorrect === 0) {
interval = 1
} else if (consecutiveCorrect === 1) {
interval = 6
} else {
interval = Math.round(interval * easeFactor)
}
easeFactor = easeFactor + 0.1 // Can adjust based on quality
consecutiveCorrect++
nextReviewDate = now + interval days
On Incorrect Answer
interval = 1
consecutiveCorrect = 0
nextReviewDate = now + 1 day
easeFactor = Math.max(1.3, easeFactor - 0.2)
Card Selection
- Query:
WHERE nextReviewDate <= now AND userId = currentUser - Apply manual difficulty (SUSPENDED = exclude, HARD = priority, EASY = depriority)
- Sort: nextReviewDate ASC, incorrectCount DESC, consecutiveCorrect ASC
- Limit to user's cardsPerSession
- If not enough, add new cards from collections
Wrong Answer Generation
- Select 3 random incorrect pinyin from same HSK level
- Ensure no duplicates
- Randomize order (Fisher-Yates shuffle)
6. UI/UX Pages
Public
/- Landing page/login- Login form/register- Registration form
Authenticated
/dashboard- Due cards, progress widgets, recent activity, quick start/learn/[collectionId]- Learning session with cards/collections- List all collections (global + user's)/collections/[id]- Collection detail, hanzi list, edit/collections/new- Create collection/hanzi- Search hanzi (filters, pagination)/hanzi/[id]- Hanzi detail (all transcriptions, meanings, etc)/progress- Charts, stats, session history/settings- User preferences
Admin
/admin/collections- Manage global collections/admin/hanzi- Manage hanzi database/admin/import- Import data (JSON/CSV upload)/admin/users- User management
Key UI Components
- LearningCard: Large hanzi, 4 pinyin options in 2x2 grid, progress bar
- AnswerFeedback: Green/red feedback, show correct answer, streak, removal suggestion
- CollectionCard: Name, count, progress, quick actions
- DashboardWidgets: Due cards, daily progress, streak, recent activity
- Charts: Activity heatmap, accuracy line chart, HSK breakdown bar chart
Design
- Mobile-first responsive
- Dark mode support
- Tailwind CSS
- Keyboard shortcuts (1-4 for answers, Space to continue)
- WCAG 2.1 AA accessibility
7. Data Import Formats
HSK JSON (from github.com/drkameleon/complete-hsk-vocabulary)
{
"simplified": "爱好",
"radical": "爫",
"level": ["new-1", "old-3"],
"frequency": 4902,
"pos": ["n", "v"],
"forms": [{
"traditional": "愛好",
"transcriptions": {
"pinyin": "ài hào",
"numeric": "ai4 hao4"
},
"meanings": ["to like; hobby"],
"classifiers": ["个"]
}]
}
CSV Format
simplified,traditional,pinyin,meaning,hsk_level,radical,frequency,pos,classifiers
爱好,愛好,ài hào,"to like; hobby",new-1,爫,4902,"n,v",个
8. Testing Strategy
Unit Tests (70% coverage target)
- SM-2 algorithm - All calculation paths
- Card selection logic - Sorting, filtering, limits
- Parsers - JSON/CSV parsing, error handling
- Validation schemas - Zod schemas
Integration Tests (80% of Server Actions)
- Auth actions with database
- Learning flow (start session, submit answers, end session)
- Collection CRUD
- Import process
E2E Tests (Critical paths)
- Complete learning session
- Create collection and add hanzi
- Search hanzi
- Admin import
- Auth flow
Tools: Vitest (unit/integration), Playwright (E2E)
9. Development Milestones
Week 1: Foundation ✅ COMPLETE
- ✅ Setup Next.js 16 project
- ✅ Configure Prisma + PostgreSQL
- ✅ Setup Docker Compose
- ✅ Create all data models (18 models, 3 enums)
- ✅ Configure NextAuth.js
- ✅ Middleware for route protection
- ✅ All Prisma relations implemented
- ✅ Database migrations created
- ✅ Docker containers: nginx, app, postgres
- ✅ Build successful
Week 2: Authentication ✅ COMPLETE
- ✅ Registration/login pages
- ✅ Middleware protection
- ✅ User preferences (cardsPerSession, characterDisplay, hideEnglish)
- ✅ Integration tests (10 tests for auth, 8 tests for preferences)
- ✅ Server Actions: register, login, updatePreferences, getPreferences
- ✅ Zod validation for all inputs
- ✅ Password hashing with bcrypt
- ✅ Session management with NextAuth.js v5
- ✅ Settings page with preferences form
Week 3-4: Data Import ✅ COMPLETE
- ✅ Admin role middleware
- ✅ HSK JSON parser (
src/lib/import/json-parser.ts)- ✅ Support for complete-hsk-vocabulary format
- ✅ All transcription types (pinyin, numeric, wade-giles, zhuyin, ipa)
- ✅ Multi-character hanzi support
- ✅ HSK level mapping (new-1 through old-6)
- ✅ CSV parser (
src/lib/import/csv-parser.ts)- ✅ Flexible column mapping
- ✅ Comma-separated multi-values
- ✅ Complete field validation
- ✅ Import UI and actions
- ✅ File upload and paste textarea
- ✅ Update existing or skip duplicates
- ✅ Detailed results with line-level errors
- ✅ Test with real HSK data
- ✅ 14 passing integration tests
- ✅ Admin import page at /admin/import
- ✅ Enhancement: Database initialization system
- ✅
getInitializationFiles()Server Action to list available files - ✅ Multi-file selection for batch initialization
- ✅ SSE API endpoint (
/api/admin/initialize) for long-running operations - ✅ Real-time progress updates via Server-Sent Events
- ✅ Progress bar showing percent, current/total, and operation message
- ✅ Auto-create HSK level collections from hanzi level attributes
- ✅ Auto-populate collections with corresponding hanzi
- ✅ Optional clean data mode (delete all existing data)
- ✅ Admin initialization page at /admin/initialize with SSE integration
- ✅ No timeouts: processes complete.json (11K+ hanzi) smoothly
- ✅
Week 5: Collections ✅ COMPLETE
- ✅ Collections CRUD (Server Actions in
src/actions/collections.ts)- ✅ createCollection()
- ✅ getUserCollections()
- ✅ getCollectionById()
- ✅ updateCollection()
- ✅ deleteCollection()
- ✅ Add/remove hanzi
- ✅ addHanziToCollection() with multi-select
- ✅ removeHanziFromCollection() with bulk support
- ✅ Search & select interface
- ✅ Paste list interface (comma, space, newline separated)
- ✅ Global HSK collections
- ✅ isPublic flag for admin-created collections
- ✅ Read-only for regular users
- ✅ Full control for admins
- ✅ 21 passing integration tests
- ✅ Pages: /collections, /collections/[id], /collections/new
- ✅ Order preservation with orderIndex
Week 5: Hanzi Search ✅ COMPLETE
- ✅ Search page (
/hanzi)- ✅ Query input for simplified, traditional, pinyin, meaning
- ✅ Case-insensitive search
- ✅ Multi-character support
- ✅ Filters (HSK level)
- ✅ 12 HSK levels (new-1 through new-6, old-1 through old-6)
- ✅ Dynamic filtering on hskLevels relation
- ✅ Hanzi detail view (
/hanzi/[id])- ✅ Large character display
- ✅ All forms with isDefault indicator
- ✅ All transcriptions grouped by type
- ✅ All meanings with language codes
- ✅ HSK level badges, parts of speech
- ✅ Classifiers, radical, frequency
- ✅ Add to collection button with modal
- ✅ Pagination
- ✅ 20 results per page
- ✅ hasMore indicator (limit+1 pattern)
- ✅ Previous/Next controls
- ✅ 16 passing integration tests
- ✅ Public access (no authentication required)
- ✅ Server Actions: searchHanzi(), getHanzi(), getHanziBySimplified()
Week 6: SM-2 Algorithm ✅ COMPLETE
- ✅ Implement algorithm (
src/lib/learning/sm2.ts)- ✅ calculateCorrectAnswer() with exact formulas
- ✅ calculateIncorrectAnswer() with exact formulas
- ✅ Initial values: easeFactor=2.5, interval=1, consecutiveCorrect=0
- ✅ Correct answer intervals: 1, 6, then interval × easeFactor
- ✅ Incorrect answer: reset to 1 day, decrease easeFactor
- ✅ Card selection logic
- ✅ selectCardsForSession() with priority sorting
- ✅ Filter SUSPENDED cards
- ✅ Priority: HARD > NORMAL > EASY
- ✅ Sort: nextReviewDate ASC, incorrectCount DESC, consecutiveCorrect ASC
- ✅ Wrong answer generation
- ✅ generateWrongAnswers() selects 3 from same HSK level
- ✅ Fisher-Yates shuffle for randomization
- ✅ shuffleOptions() for answer position randomization
- ✅ Unit tests (38 tests, 100% coverage)
- ✅ Test all calculation formulas
- ✅ Test edge cases (minimum easeFactor, large intervals, etc.)
- ✅ Test card selection with all sorting criteria
- ✅ Test wrong answer generation
- ✅ 100% statement and line coverage
- ✅ 94.11% branch coverage (exceeds 90% requirement)
Week 7-8: Learning Interface ✅ COMPLETE
- ✅ Learning session pages
- ✅
/learn/[collectionId]dynamic route - ✅ Large hanzi display (text-9xl)
- ✅ 4 pinyin options in 2x2 grid
- ✅ Progress bar with card count
- ✅
- ✅ Card component
- ✅ Auto-submit after selection
- ✅ Green/red feedback overlay
- ✅ English meaning display
- ✅ Answer submission
- ✅
submitAnswer()Server Action - ✅ SM-2 progress updates
- ✅ Session review tracking
- ✅
- ✅ Feedback UI
- ✅ Correct/incorrect indicators
- ✅ Correct answer display
- ✅ Vocabulary meaning reinforcement
- ✅ Session summary
- ✅ Total cards, accuracy, duration
- ✅ Correct/incorrect breakdown
- ✅ Keyboard shortcuts
- ✅ 1-4 for answer selection
- ✅ Space to continue
- ✅ Learning Server Actions (
src/actions/learning.ts)- ✅
startLearningSession()- Initialize with SM-2 card selection - ✅
submitAnswer()- Record and update progress - ✅
endSession()- Calculate summary stats - ✅
getDueCards()- Count due cards - ✅
updateCardDifficulty()- Manual difficulty override - ✅
removeFromLearning()- Suspend card
- ✅
- ✅ Two-stage card randomization
- ✅ Random tiebreaker during selection
- ✅ Final shuffle for presentation
- ✅ Navigation integration
- ✅ Dashboard "Start Learning" button
- ✅ Collection "Start Learning" button
- ✅ All 38 SM-2 algorithm tests passing (98.92% coverage)
Week 9: Dashboard & Progress ✅
- ✅ Dashboard widgets with real statistics (due cards, total learned, daily goal, streak)
- ✅ Progress page with charts and session history
- ✅ Charts (Recharts) - Daily activity bar chart, accuracy trend line chart
- ✅ Statistics Server Actions (getStatistics, getUserProgress, getLearningSessions, getHanziProgress, resetHanziProgress)
- ✅ Recent activity section on dashboard
- ✅ Date range filtering (7/30/90/365 days)
- ✅ Session history table with complete details
- ✅ Navigation links to progress page
Week 10: UI Polish
- Responsive layouts
- Mobile navigation
- Dark mode
- Loading/empty states
- Toast notifications
- Accessibility improvements
Week 11: Testing & Docs
- Complete test coverage
- E2E tests for all critical flows
- README and documentation
- Security audit
Week 12: Deployment
- Production environment
- Docker deployment
- SSL certificates
- Database backup
- Import HSK data
- Final testing
10. Docker Configuration
docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
ports: ["80:80", "443:443"]
volumes:
- ./docker/nginx.conf:/etc/nginx/nginx.conf:ro
- ./docker/ssl:/etc/nginx/ssl:ro
depends_on: [app]
app:
build: .
expose: ["3000"]
environment:
- DATABASE_URL=postgresql://memohanzi_user:password@postgres:5432/memohanzi_db
- NEXTAUTH_URL=https://yourdomain.com
- NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
depends_on:
postgres:
condition: service_healthy
postgres:
image: postgres:18-alpine
environment:
POSTGRES_USER: memohanzi_user
POSTGRES_PASSWORD: password
POSTGRES_DB: memohanzi_db
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U hanzi_user"]
volumes:
postgres-data:
Environment Variables
# .env.local
DATABASE_URL="postgresql://memohanzi_user:password@localhost:5432/memohanzi_db"
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="generate-with-openssl-rand-base64-32"
NODE_ENV="development"
11. Security Checklist
- Passwords hashed with bcrypt (10 rounds)
- Session tokens httpOnly, sameSite
- CSRF protection (NextAuth.js)
- Rate limiting (Nginx)
- Input validation (Zod, server-side)
- SQL injection prevented (Prisma)
- XSS prevention (React escaping)
- HTTPS enforced (Nginx)
- Secure headers (Nginx)
- Role-based access enforced server-side
- No sensitive data in logs
- Environment variables for secrets
12. Phase 2 Features
- Additional Languages - Multi-language support for meanings
- Learning Modes - Radical identification, hanzi-to-meaning, meaning-to-hanzi, tone practice
- Autocomplete Data - Auto-fill missing hanzi info from APIs
- User Suggestions - Allow users to report/suggest corrections
13. Phase 3 Ideas
- Writing practice (stroke order validation)
- Social features (public collections, sharing)
- Gamification (streaks, badges, leaderboards)
- Mobile apps (React Native)
- Audio pronunciation
- Example sentences
- Advanced SRS algorithms
14. Quick Reference Commands
Development:
# Start
docker-compose up
npm run dev
# Database
npx prisma migrate dev
npx prisma db seed
npx prisma studio
# Testing
npm run test
npm run test:e2e
Production:
# Deploy
docker-compose up -d --build
# Monitor
docker-compose logs -f
15. Success Criteria (MVP)
Technical:
- All tests passing (70%+ coverage)
- Can import complete HSK vocabulary (5000+ hanzi)
- Page load <2s
- Learning session responsive (<100ms)
- Mobile responsive
Functional:
- Complete learning session works end-to-end
- SM-2 algorithm calculates correctly
- Progress tracking accurate
- Collections management works
- Search works efficiently
User Experience:
- Can learn 20+ cards in 5-10 minutes
- Interface intuitive
- Daily use sustainable
Implementation Notes
Priority Order
- Authentication (foundational)
- Data import (need data)
- Collections (organize learning)
- Search (browse data)
- Learning algorithm (core logic)
- Learning interface (user interaction)
- Progress tracking (motivation)
- Polish & deploy
Critical Paths to Test
- Register → Login → Create Collection → Add Hanzi → Start Learning → Complete Session → View Progress
- Admin → Import HSK Data → Create Global Collection → User uses global collection
- Search Hanzi → View Detail → Add to Collection → Learn
Key Implementation Files
prisma/schema.prisma- All data modelssrc/lib/learning/sm2.ts- SM-2 algorithmsrc/lib/learning/card-selector.ts- Card selectionsrc/lib/import/hsk-parser.ts- Parse HSK JSONsrc/actions/learning.ts- Learning Server Actionssrc/app/(app)/learn/[collectionId]/page.tsx- Learning UI
Resources
- HSK Data Source: https://github.com/drkameleon/complete-hsk-vocabulary
- Next.js Docs: https://nextjs.org/docs
- Prisma Docs: https://www.prisma.io/docs
- NextAuth Docs: https://authjs.dev
- SM-2 Algorithm: https://www.supermemo.com/en/archives1990-2015/english/ol/sm2
This specification is complete and ready for implementation with Claude Code.
Start with Milestone 1 (Week 1: Foundation) and proceed sequentially through the milestones.