memohanzi/HANZI-LEARNING-APP-SPECIFICATION.md

# MemoHanzi - Implementation Specification

**Version:** 1.0
**Status:** Ready for Implementation
**Target:** Claude Code
**Application Name:** MemoHanzi (记汉字 - "Remember Hanzi")

---

## Quick Start Summary

**What:** MemoHanzi is a self-hosted web app for learning Chinese characters (hanzi) using spaced repetition (SM-2 algorithm)

**Tech Stack:**
- Next.js 16 (TypeScript, App Router, Server Actions)
- PostgreSQL 18 + Prisma ORM
- NextAuth.js v5 for authentication
- Docker Compose deployment with Nginx reverse proxy
- Tailwind CSS, React Hook Form, Zod validation, Recharts

**MVP Timeline:** 10-12 weeks

---

## 1. Core Features (MVP)

### User Features
- ✅ Registration/Login with email & password
- ✅ Create and manage personal hanzi collections
- ✅ Browse and use global HSK-level collections
- ✅ Learning sessions with 4-choice pinyin quiz
- ✅ SM-2 spaced repetition algorithm
- ✅ Progress tracking & statistics dashboard
- ✅ Search hanzi database (by character, pinyin, meaning)
- ✅ User preferences (language, display options, learning settings)

### Admin Features
- ✅ Import hanzi data (JSON/CSV from HSK vocabulary source)
- ✅ Manage global collections
- ✅ User management (roles, activation)

---

## 2. System Architecture

### Deployment Stack
```
[Nginx Reverse Proxy:80/443]
    ↓ HTTPS/Rate Limiting/Caching
[Next.js App:3000]
    ↓ Prisma ORM
[PostgreSQL:5432]
```

### Project Structure
```
memohanzi/
├── src/
│   ├── app/              # Next.js App Router
│   │   ├── (auth)/       # Login, register
│   │   ├── (app)/        # Dashboard, learn, collections, hanzi, progress, settings
│   │   └── (admin)/      # Admin pages
│   ├── actions/          # Server Actions (auth, collections, hanzi, learning, etc.)
│   ├── components/       # React components
│   ├── lib/              # Utils (SM-2 algorithm, parsers, validation)
│   └── types/            # TypeScript types
├── prisma/
│   └── schema.prisma     # Database schema
├── docker/
│   ├── Dockerfile
│   └── nginx.conf
└── docker-compose.yml
```

---

## 3. Database Schema (Prisma)

### Core Models

**Language** - Stores supported translation languages
- Fields: code (ISO 639-1), name, nativeName, isActive

**Hanzi** - Base hanzi information
- Fields: simplified (unique), radical, frequency
- Relations: forms, hskLevels, partsOfSpeech, userProgress, collectionItems

**HanziForm** - Traditional variants
- Fields: hanziId, traditional, isDefault
- Relations: transcriptions, meanings, classifiers

**HanziTranscription** - Multiple transcription types
- Fields: formId, type (pinyin/numeric/wadegiles/etc), value

**HanziMeaning** - Multi-language meanings
- Fields: formId, languageId, meaning, orderIndex

**HanziHSKLevel** - HSK level tags
- Fields: hanziId, level (e.g., "new-1", "old-3")

**HanziPOS** - Parts of speech
- Fields: hanziId, pos (n/v/adj/etc)

**HanziClassifier** - Measure words
- Fields: formId, classifier

### User & Auth Models

**User**
- Fields: email, password (hashed), name, role (USER/ADMIN/MODERATOR), isActive
- Relations: collections, hanziProgress, preferences, sessions

**UserPreference**
- Fields: preferredLanguageId, characterDisplay (SIMPLIFIED/TRADITIONAL/BOTH), transcriptionType, cardsPerSession, dailyGoal, removalThreshold, allowManualDifficulty

**Account, Session, VerificationToken** - NextAuth.js standard models

### Learning Models

**Collection**
- Fields: name, description, isGlobal, createdBy, isPublic
- Relations: items (CollectionItem join table)

**CollectionItem** - Join table
- Fields: collectionId, hanziId, orderIndex

**UserHanziProgress** - Tracks learning per hanzi
- Fields: userId, hanziId, correctCount, incorrectCount, consecutiveCorrect
- SM-2 fields: easeFactor (default 2.5), interval (default 1), nextReviewDate
- Manual override: manualDifficulty (EASY/MEDIUM/HARD/SUSPENDED)

**LearningSession** - Track study sessions
- Fields: userId, startedAt, endedAt, cardsReviewed, correctAnswers, incorrectAnswers, collectionId
- Relations: reviews (SessionReview)

**SessionReview** - Individual card reviews
- Fields: sessionId, hanziId, isCorrect, responseTime

---

## 4. Server Actions API

All actions return: `{ success: boolean, data?: T, message?: string, errors?: Record<string, string[]> }`

### Authentication (`src/actions/auth.ts`)
- `register(email, password, name)` - Create account
- `login(email, password)` - Authenticate
- `logout()` - End session
- `updatePassword(current, new)` - Change password
- `updateProfile(name, email, image)` - Update user

### Collections (`src/actions/collections.ts`)
- `createCollection(name, description, isPublic)` - New collection
- `updateCollection(id, data)` - Modify (owner/admin only)
- `deleteCollection(id)` - Remove (owner/admin only)
- `getCollection(id)` - Get with hanzi
- `getUserCollections()` - List user's collections
- `getGlobalCollections()` - List HSK collections
- `addHanziToCollection(collectionId, hanziIds[])` - Add hanzi
- `removeHanziFromCollection(collectionId, hanziId)` - Remove hanzi

### Hanzi (`src/actions/hanzi.ts`)
- `searchHanzi(query, hskLevel?, limit, offset)` - Search database (public)
- `getHanzi(id)` - Get details (public)
- `getHanziBySimplified(char)` - Lookup by character (public)

### Learning (`src/actions/learning.ts`)
- `startLearningSession(collectionId?, cardsCount)` - Begin session, returns cards
- `submitAnswer(sessionId, hanziId, selected, correct, time)` - Record answer, updates SM-2
- `endSession(sessionId)` - Complete, return summary
- `getDueCards()` - Get counts (now, today, week)
- `updateCardDifficulty(hanziId, difficulty)` - Manual override
- `removeFromLearning(hanziId)` - Stop learning card

### Progress (`src/actions/progress.ts`)
- `getUserProgress(dateRange?)` - Overall stats & charts
- `getHanziProgress(hanziId)` - Individual hanzi stats
- `getLearningSessions(limit?)` - Session history
- `getStatistics()` - Dashboard stats
- `resetHanziProgress(hanziId)` - Reset card

### Preferences (`src/actions/preferences.ts`)
- `getPreferences()` - Get settings
- `updatePreferences(data)` - Update settings
- `getAvailableLanguages()` - List languages

### Admin (`src/actions/admin.ts`)
- `createGlobalCollection(name, description, hskLevel)` - HSK collection
- `importHanzi(fileData, format)` - Bulk import (JSON/CSV)
- `getImportHistory()` - Past imports
- `getUserManagement(page, pageSize)` - List users
- `updateUserRole(userId, role)` - Change role
- `toggleUserStatus(userId)` - Activate/deactivate

---

## 5. SM-2 Algorithm Implementation

### Initial Values
- easeFactor: 2.5
- interval: 1 day
- consecutiveCorrect: 0

### On Correct Answer
```javascript
if (consecutiveCorrect === 0) {
  interval = 1
} else if (consecutiveCorrect === 1) {
  interval = 6
} else {
  interval = Math.round(interval * easeFactor)
}

easeFactor = easeFactor + 0.1  // Can adjust based on quality
consecutiveCorrect++
nextReviewDate = now + interval days
```

### On Incorrect Answer
```javascript
interval = 1
consecutiveCorrect = 0
nextReviewDate = now + 1 day
easeFactor = Math.max(1.3, easeFactor - 0.2)
```

### Card Selection
1. Query: `WHERE nextReviewDate <= now AND userId = currentUser`
2. Apply manual difficulty (SUSPENDED = exclude, HARD = priority, EASY = depriority)
3. Sort: nextReviewDate ASC, incorrectCount DESC, consecutiveCorrect ASC
4. Limit to user's cardsPerSession
5. If not enough, add new cards from collections

### Wrong Answer Generation
- Select 3 random incorrect pinyin from same HSK level
- Ensure no duplicates
- Randomize order (Fisher-Yates shuffle)

---

## 6. UI/UX Pages

### Public
- `/` - Landing page
- `/login` - Login form
- `/register` - Registration form

### Authenticated
- `/dashboard` - Due cards, progress widgets, recent activity, quick start
- `/learn/[collectionId]` - Learning session with cards
- `/collections` - List all collections (global + user's)
- `/collections/[id]` - Collection detail, hanzi list, edit
- `/collections/new` - Create collection
- `/hanzi` - Search hanzi (filters, pagination)
- `/hanzi/[id]` - Hanzi detail (all transcriptions, meanings, etc)
- `/progress` - Charts, stats, session history
- `/settings` - User preferences

### Admin
- `/admin/collections` - Manage global collections
- `/admin/hanzi` - Manage hanzi database
- `/admin/import` - Import data (JSON/CSV upload)
- `/admin/users` - User management

### Key UI Components
- **LearningCard**: Large hanzi, 4 pinyin options in 2x2 grid, progress bar
- **AnswerFeedback**: Green/red feedback, show correct answer, streak, removal suggestion
- **CollectionCard**: Name, count, progress, quick actions
- **DashboardWidgets**: Due cards, daily progress, streak, recent activity
- **Charts**: Activity heatmap, accuracy line chart, HSK breakdown bar chart

### Design
- Mobile-first responsive
- Dark mode support
- Tailwind CSS
- Keyboard shortcuts (1-4 for answers, Space to continue)
- WCAG 2.1 AA accessibility

---

## 7. Data Import Formats

### HSK JSON (from github.com/drkameleon/complete-hsk-vocabulary)
```json
{
  "simplified": "爱好",
  "radical": "爫",
  "level": ["new-1", "old-3"],
  "frequency": 4902,
  "pos": ["n", "v"],
  "forms": [{
    "traditional": "愛好",
    "transcriptions": {
      "pinyin": "ài hào",
      "numeric": "ai4 hao4"
    },
    "meanings": ["to like; hobby"],
    "classifiers": ["个"]
  }]
}
```

### CSV Format
```csv
simplified,traditional,pinyin,meaning,hsk_level,radical,frequency,pos,classifiers
爱好,愛好,ài hào,"to like; hobby",new-1,爫,4902,"n,v",个
```

---

## 8. Testing Strategy

### Unit Tests (70% coverage target)
- **SM-2 algorithm** - All calculation paths
- **Card selection logic** - Sorting, filtering, limits
- **Parsers** - JSON/CSV parsing, error handling
- **Validation schemas** - Zod schemas

### Integration Tests (80% of Server Actions)
- Auth actions with database
- Learning flow (start session, submit answers, end session)
- Collection CRUD
- Import process

### E2E Tests (Critical paths)
- Complete learning session
- Create collection and add hanzi
- Search hanzi
- Admin import
- Auth flow

**Tools:** Vitest (unit/integration), Playwright (E2E)

---

## 9. Development Milestones

### Week 1: Foundation
- Setup Next.js 16 project
- Configure Prisma + PostgreSQL
- Setup Docker Compose
- Create all data models
- Configure NextAuth.js

### Week 2: Authentication
- Registration/login pages
- Middleware protection
- User preferences
- Integration tests

### Week 3-4: Data Import
- Admin role middleware
- HSK JSON parser
- CSV parser
- Import UI and actions
- Test with real HSK data

### Week 5: Collections
- Collections CRUD
- Add/remove hanzi
- Global HSK collections

### Week 5: Hanzi Search
- Search page
- Filters (HSK level)
- Hanzi detail view
- Pagination

### Week 6: SM-2 Algorithm
- Implement algorithm
- Card selection logic
- Progress tracking
- Unit tests (90%+ coverage)

### Week 7-8: Learning Interface
- Learning session pages
- Card component
- Answer submission
- Feedback UI
- Session summary
- Keyboard shortcuts
- E2E tests

### Week 9: Dashboard & Progress
- Dashboard widgets
- Progress page
- Charts (Recharts)
- Statistics calculations

### Week 10: UI Polish
- Responsive layouts
- Mobile navigation
- Dark mode
- Loading/empty states
- Toast notifications
- Accessibility improvements

### Week 11: Testing & Docs
- Complete test coverage
- E2E tests for all critical flows
- README and documentation
- Security audit

### Week 12: Deployment
- Production environment
- Docker deployment
- SSL certificates
- Database backup
- Import HSK data
- Final testing

---

## 10. Docker Configuration

### docker-compose.yml
```yaml
version: '3.8'
services:
  nginx:
    image: nginx:alpine
    ports: ["80:80", "443:443"]
    volumes:
      - ./docker/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./docker/ssl:/etc/nginx/ssl:ro
    depends_on: [app]

  app:
    build: .
    expose: ["3000"]
    environment:
      - DATABASE_URL=postgresql://memohanzi_user:password@postgres:5432/memohanzi_db
      - NEXTAUTH_URL=https://yourdomain.com
      - NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
    depends_on:
      postgres:
        condition: service_healthy

  postgres:
    image: postgres:18-alpine
    environment:
      POSTGRES_USER: memohanzi_user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: memohanzi_db
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U hanzi_user"]

volumes:
  postgres-data:
```

### Environment Variables
```bash
# .env.local
DATABASE_URL="postgresql://memohanzi_user:password@localhost:5432/memohanzi_db"
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="generate-with-openssl-rand-base64-32"
NODE_ENV="development"
```

---

## 11. Security Checklist

- [ ] Passwords hashed with bcrypt (10 rounds)
- [ ] Session tokens httpOnly, sameSite
- [ ] CSRF protection (NextAuth.js)
- [ ] Rate limiting (Nginx)
- [ ] Input validation (Zod, server-side)
- [ ] SQL injection prevented (Prisma)
- [ ] XSS prevention (React escaping)
- [ ] HTTPS enforced (Nginx)
- [ ] Secure headers (Nginx)
- [ ] Role-based access enforced server-side
- [ ] No sensitive data in logs
- [ ] Environment variables for secrets

---

## 12. Phase 2 Features

1. **Additional Languages** - Multi-language support for meanings
2. **Learning Modes** - Radical identification, hanzi-to-meaning, meaning-to-hanzi, tone practice
3. **Autocomplete Data** - Auto-fill missing hanzi info from APIs
4. **User Suggestions** - Allow users to report/suggest corrections

---

## 13. Phase 3 Ideas

- Writing practice (stroke order validation)
- Social features (public collections, sharing)
- Gamification (streaks, badges, leaderboards)
- Mobile apps (React Native)
- Audio pronunciation
- Example sentences
- Advanced SRS algorithms

---

## 14. Quick Reference Commands

**Development:**
```bash
# Start
docker-compose up
npm run dev

# Database
npx prisma migrate dev
npx prisma db seed
npx prisma studio

# Testing
npm run test
npm run test:e2e
```

**Production:**
```bash
# Deploy
docker-compose up -d --build

# Monitor
docker-compose logs -f
```

---

## 15. Success Criteria (MVP)

**Technical:**
- [ ] All tests passing (70%+ coverage)
- [ ] Can import complete HSK vocabulary (5000+ hanzi)
- [ ] Page load <2s
- [ ] Learning session responsive (<100ms)
- [ ] Mobile responsive

**Functional:**
- [ ] Complete learning session works end-to-end
- [ ] SM-2 algorithm calculates correctly
- [ ] Progress tracking accurate
- [ ] Collections management works
- [ ] Search works efficiently

**User Experience:**
- [ ] Can learn 20+ cards in 5-10 minutes
- [ ] Interface intuitive
- [ ] Daily use sustainable

---

## Implementation Notes

### Priority Order
1. Authentication (foundational)
2. Data import (need data)
3. Collections (organize learning)
4. Search (browse data)
5. Learning algorithm (core logic)
6. Learning interface (user interaction)
7. Progress tracking (motivation)
8. Polish & deploy

### Critical Paths to Test
1. Register → Login → Create Collection → Add Hanzi → Start Learning → Complete Session → View Progress
2. Admin → Import HSK Data → Create Global Collection → User uses global collection
3. Search Hanzi → View Detail → Add to Collection → Learn

### Key Implementation Files
- `prisma/schema.prisma` - All data models
- `src/lib/learning/sm2.ts` - SM-2 algorithm
- `src/lib/learning/card-selector.ts` - Card selection
- `src/lib/import/hsk-parser.ts` - Parse HSK JSON
- `src/actions/learning.ts` - Learning Server Actions
- `src/app/(app)/learn/[collectionId]/page.tsx` - Learning UI

---

## Resources

- **HSK Data Source**: https://github.com/drkameleon/complete-hsk-vocabulary
- **Next.js Docs**: https://nextjs.org/docs
- **Prisma Docs**: https://www.prisma.io/docs
- **NextAuth Docs**: https://authjs.dev
- **SM-2 Algorithm**: https://www.supermemo.com/en/archives1990-2015/english/ol/sm2

---

**This specification is complete and ready for implementation with Claude Code.**

Start with Milestone 1 (Week 1: Foundation) and proceed sequentially through the milestones.