Files
memohanzi/HANZI-LEARNING-APP-SPECIFICATION.md
Stefan Hardegger 9a30d7c4e5 milestone 9
2025-11-25 14:16:25 +01:00

22 KiB
Raw Permalink Blame History

MemoHanzi - Implementation Specification

Version: 1.0
Status: Ready for Implementation
Target: Claude Code Application Name: MemoHanzi (记汉字 - "Remember Hanzi")


Quick Start Summary

What: MemoHanzi is a self-hosted web app for learning Chinese characters (hanzi) using spaced repetition (SM-2 algorithm)

Tech Stack:

  • Next.js 16 (TypeScript, App Router, Server Actions)
  • PostgreSQL 18 + Prisma ORM
  • NextAuth.js v5 for authentication
  • Docker Compose deployment with Nginx reverse proxy
  • Tailwind CSS, React Hook Form, Zod validation, Recharts

MVP Timeline: 10-12 weeks


1. Core Features (MVP)

User Features

  • Registration/Login with email & password
  • Create and manage personal hanzi collections
  • Browse and use global HSK-level collections
  • Learning sessions with 4-choice pinyin quiz
  • SM-2 spaced repetition algorithm
  • Progress tracking & statistics dashboard
  • Search hanzi database (by character, pinyin, meaning)
  • User preferences (language, display options, learning settings)

Admin Features

  • Import hanzi data (JSON/CSV from HSK vocabulary source)
  • Manage global collections
  • User management (roles, activation)

2. System Architecture

Deployment Stack

[Nginx Reverse Proxy:80/443] 
    ↓ HTTPS/Rate Limiting/Caching
[Next.js App:3000]
    ↓ Prisma ORM
[PostgreSQL:5432]

Project Structure

memohanzi/
├── src/
│   ├── app/              # Next.js App Router
│   │   ├── (auth)/       # Login, register
│   │   ├── (app)/        # Dashboard, learn, collections, hanzi, progress, settings
│   │   └── (admin)/      # Admin pages
│   ├── actions/          # Server Actions (auth, collections, hanzi, learning, etc.)
│   ├── components/       # React components
│   ├── lib/              # Utils (SM-2 algorithm, parsers, validation)
│   └── types/            # TypeScript types
├── prisma/
│   └── schema.prisma     # Database schema
├── docker/
│   ├── Dockerfile
│   └── nginx.conf
└── docker-compose.yml

3. Database Schema (Prisma)

Core Models

Language - Stores supported translation languages

  • Fields: code (ISO 639-1), name, nativeName, isActive

Hanzi - Base hanzi information

  • Fields: simplified (unique), radical, frequency
  • Relations: forms, hskLevels, partsOfSpeech, userProgress, collectionItems

HanziForm - Traditional variants

  • Fields: hanziId, traditional, isDefault
  • Relations: transcriptions, meanings, classifiers

HanziTranscription - Multiple transcription types

  • Fields: formId, type (pinyin/numeric/wadegiles/etc), value

HanziMeaning - Multi-language meanings

  • Fields: formId, languageId, meaning, orderIndex

HanziHSKLevel - HSK level tags

  • Fields: hanziId, level (e.g., "new-1", "old-3")

HanziPOS - Parts of speech

  • Fields: hanziId, pos (n/v/adj/etc)

HanziClassifier - Measure words

  • Fields: formId, classifier

User & Auth Models

User

  • Fields: email, password (hashed), name, role (USER/ADMIN/MODERATOR), isActive
  • Relations: collections, hanziProgress, preferences, sessions

UserPreference

  • Fields: preferredLanguageId, characterDisplay (SIMPLIFIED/TRADITIONAL/BOTH), transcriptionType, cardsPerSession, dailyGoal, removalThreshold, allowManualDifficulty

Account, Session, VerificationToken - NextAuth.js standard models

Learning Models

Collection

  • Fields: name, description, isGlobal, createdBy, isPublic
  • Relations: items (CollectionItem join table)

CollectionItem - Join table

  • Fields: collectionId, hanziId, orderIndex

UserHanziProgress - Tracks learning per hanzi

  • Fields: userId, hanziId, correctCount, incorrectCount, consecutiveCorrect
  • SM-2 fields: easeFactor (default 2.5), interval (default 1), nextReviewDate
  • Manual override: manualDifficulty (EASY/MEDIUM/HARD/SUSPENDED)

LearningSession - Track study sessions

  • Fields: userId, startedAt, endedAt, cardsReviewed, correctAnswers, incorrectAnswers, collectionId
  • Relations: reviews (SessionReview)

SessionReview - Individual card reviews

  • Fields: sessionId, hanziId, isCorrect, responseTime

4. Server Actions API

All actions return: { success: boolean, data?: T, message?: string, errors?: Record<string, string[]> }

Authentication (src/actions/auth.ts)

  • register(email, password, name) - Create account
  • login(email, password) - Authenticate
  • logout() - End session
  • updatePassword(current, new) - Change password
  • updateProfile(name, email, image) - Update user

Collections (src/actions/collections.ts)

  • createCollection(name, description, isPublic) - New collection
  • updateCollection(id, data) - Modify (owner/admin only)
  • deleteCollection(id) - Remove (owner/admin only)
  • getCollection(id) - Get with hanzi
  • getUserCollections() - List user's collections
  • getGlobalCollections() - List HSK collections
  • addHanziToCollection(collectionId, hanziIds[]) - Add hanzi
  • removeHanziFromCollection(collectionId, hanziId) - Remove hanzi

Hanzi (src/actions/hanzi.ts)

  • searchHanzi(query, hskLevel?, limit, offset) - Search database (public)
  • getHanzi(id) - Get details (public)
  • getHanziBySimplified(char) - Lookup by character (public)

Learning (src/actions/learning.ts)

  • startLearningSession(collectionId?, cardsCount) - Begin session, returns cards
  • submitAnswer(sessionId, hanziId, selected, correct, time) - Record answer, updates SM-2
  • endSession(sessionId) - Complete, return summary
  • getDueCards() - Get counts (now, today, week)
  • updateCardDifficulty(hanziId, difficulty) - Manual override
  • removeFromLearning(hanziId) - Stop learning card

Progress (src/actions/progress.ts)

  • getUserProgress(dateRange?) - Overall stats & charts
  • getHanziProgress(hanziId) - Individual hanzi stats
  • getLearningSessions(limit?) - Session history
  • getStatistics() - Dashboard stats
  • resetHanziProgress(hanziId) - Reset card

Preferences (src/actions/preferences.ts)

  • getPreferences() - Get settings
  • updatePreferences(data) - Update settings
  • getAvailableLanguages() - List languages

Admin (src/actions/admin.ts)

  • createGlobalCollection(name, description, hskLevel) - HSK collection
  • importHanzi(fileData, format) - Bulk import (JSON/CSV)
  • getImportHistory() - Past imports
  • getUserManagement(page, pageSize) - List users
  • updateUserRole(userId, role) - Change role
  • toggleUserStatus(userId) - Activate/deactivate

5. SM-2 Algorithm Implementation

Initial Values

  • easeFactor: 2.5
  • interval: 1 day
  • consecutiveCorrect: 0

On Correct Answer

if (consecutiveCorrect === 0) {
  interval = 1
} else if (consecutiveCorrect === 1) {
  interval = 6
} else {
  interval = Math.round(interval * easeFactor)
}

easeFactor = easeFactor + 0.1  // Can adjust based on quality
consecutiveCorrect++
nextReviewDate = now + interval days

On Incorrect Answer

interval = 1
consecutiveCorrect = 0
nextReviewDate = now + 1 day
easeFactor = Math.max(1.3, easeFactor - 0.2)

Card Selection

  1. Query: WHERE nextReviewDate <= now AND userId = currentUser
  2. Apply manual difficulty (SUSPENDED = exclude, HARD = priority, EASY = depriority)
  3. Sort: nextReviewDate ASC, incorrectCount DESC, consecutiveCorrect ASC
  4. Limit to user's cardsPerSession
  5. If not enough, add new cards from collections

Wrong Answer Generation

  • Select 3 random incorrect pinyin from same HSK level
  • Ensure no duplicates
  • Randomize order (Fisher-Yates shuffle)

6. UI/UX Pages

Public

  • / - Landing page
  • /login - Login form
  • /register - Registration form

Authenticated

  • /dashboard - Due cards, progress widgets, recent activity, quick start
  • /learn/[collectionId] - Learning session with cards
  • /collections - List all collections (global + user's)
  • /collections/[id] - Collection detail, hanzi list, edit
  • /collections/new - Create collection
  • /hanzi - Search hanzi (filters, pagination)
  • /hanzi/[id] - Hanzi detail (all transcriptions, meanings, etc)
  • /progress - Charts, stats, session history
  • /settings - User preferences

Admin

  • /admin/collections - Manage global collections
  • /admin/hanzi - Manage hanzi database
  • /admin/import - Import data (JSON/CSV upload)
  • /admin/users - User management

Key UI Components

  • LearningCard: Large hanzi, 4 pinyin options in 2x2 grid, progress bar
  • AnswerFeedback: Green/red feedback, show correct answer, streak, removal suggestion
  • CollectionCard: Name, count, progress, quick actions
  • DashboardWidgets: Due cards, daily progress, streak, recent activity
  • Charts: Activity heatmap, accuracy line chart, HSK breakdown bar chart

Design

  • Mobile-first responsive
  • Dark mode support
  • Tailwind CSS
  • Keyboard shortcuts (1-4 for answers, Space to continue)
  • WCAG 2.1 AA accessibility

7. Data Import Formats

HSK JSON (from github.com/drkameleon/complete-hsk-vocabulary)

{
  "simplified": "爱好",
  "radical": "爫",
  "level": ["new-1", "old-3"],
  "frequency": 4902,
  "pos": ["n", "v"],
  "forms": [{
    "traditional": "愛好",
    "transcriptions": {
      "pinyin": "ài hào",
      "numeric": "ai4 hao4"
    },
    "meanings": ["to like; hobby"],
    "classifiers": ["个"]
  }]
}

CSV Format

simplified,traditional,pinyin,meaning,hsk_level,radical,frequency,pos,classifiers
爱好,愛好,ài hào,"to like; hobby",new-1,,4902,"n,v",

8. Testing Strategy

Unit Tests (70% coverage target)

  • SM-2 algorithm - All calculation paths
  • Card selection logic - Sorting, filtering, limits
  • Parsers - JSON/CSV parsing, error handling
  • Validation schemas - Zod schemas

Integration Tests (80% of Server Actions)

  • Auth actions with database
  • Learning flow (start session, submit answers, end session)
  • Collection CRUD
  • Import process

E2E Tests (Critical paths)

  • Complete learning session
  • Create collection and add hanzi
  • Search hanzi
  • Admin import
  • Auth flow

Tools: Vitest (unit/integration), Playwright (E2E)


9. Development Milestones

Week 1: Foundation COMPLETE

  • Setup Next.js 16 project
  • Configure Prisma + PostgreSQL
  • Setup Docker Compose
  • Create all data models (18 models, 3 enums)
  • Configure NextAuth.js
  • Middleware for route protection
  • All Prisma relations implemented
  • Database migrations created
  • Docker containers: nginx, app, postgres
  • Build successful

Week 2: Authentication COMPLETE

  • Registration/login pages
  • Middleware protection
  • User preferences (cardsPerSession, characterDisplay, hideEnglish)
  • Integration tests (10 tests for auth, 8 tests for preferences)
  • Server Actions: register, login, updatePreferences, getPreferences
  • Zod validation for all inputs
  • Password hashing with bcrypt
  • Session management with NextAuth.js v5
  • Settings page with preferences form

Week 3-4: Data Import COMPLETE

  • Admin role middleware
  • HSK JSON parser (src/lib/import/json-parser.ts)
    • Support for complete-hsk-vocabulary format
    • All transcription types (pinyin, numeric, wade-giles, zhuyin, ipa)
    • Multi-character hanzi support
    • HSK level mapping (new-1 through old-6)
  • CSV parser (src/lib/import/csv-parser.ts)
    • Flexible column mapping
    • Comma-separated multi-values
    • Complete field validation
  • Import UI and actions
    • File upload and paste textarea
    • Update existing or skip duplicates
    • Detailed results with line-level errors
  • Test with real HSK data
  • 14 passing integration tests
  • Admin import page at /admin/import
  • Enhancement: Database initialization system
    • getInitializationFiles() Server Action to list available files
    • Multi-file selection for batch initialization
    • SSE API endpoint (/api/admin/initialize) for long-running operations
    • Real-time progress updates via Server-Sent Events
    • Progress bar showing percent, current/total, and operation message
    • Auto-create HSK level collections from hanzi level attributes
    • Auto-populate collections with corresponding hanzi
    • Optional clean data mode (delete all existing data)
    • Admin initialization page at /admin/initialize with SSE integration
    • No timeouts: processes complete.json (11K+ hanzi) smoothly

Week 5: Collections COMPLETE

  • Collections CRUD (Server Actions in src/actions/collections.ts)
    • createCollection()
    • getUserCollections()
    • getCollectionById()
    • updateCollection()
    • deleteCollection()
  • Add/remove hanzi
    • addHanziToCollection() with multi-select
    • removeHanziFromCollection() with bulk support
    • Search & select interface
    • Paste list interface (comma, space, newline separated)
  • Global HSK collections
    • isPublic flag for admin-created collections
    • Read-only for regular users
    • Full control for admins
  • 21 passing integration tests
  • Pages: /collections, /collections/[id], /collections/new
  • Order preservation with orderIndex

Week 5: Hanzi Search COMPLETE

  • Search page (/hanzi)
    • Query input for simplified, traditional, pinyin, meaning
    • Case-insensitive search
    • Multi-character support
  • Filters (HSK level)
    • 12 HSK levels (new-1 through new-6, old-1 through old-6)
    • Dynamic filtering on hskLevels relation
  • Hanzi detail view (/hanzi/[id])
    • Large character display
    • All forms with isDefault indicator
    • All transcriptions grouped by type
    • All meanings with language codes
    • HSK level badges, parts of speech
    • Classifiers, radical, frequency
    • Add to collection button with modal
  • Pagination
    • 20 results per page
    • hasMore indicator (limit+1 pattern)
    • Previous/Next controls
  • 16 passing integration tests
  • Public access (no authentication required)
  • Server Actions: searchHanzi(), getHanzi(), getHanziBySimplified()

Week 6: SM-2 Algorithm COMPLETE

  • Implement algorithm (src/lib/learning/sm2.ts)
    • calculateCorrectAnswer() with exact formulas
    • calculateIncorrectAnswer() with exact formulas
    • Initial values: easeFactor=2.5, interval=1, consecutiveCorrect=0
    • Correct answer intervals: 1, 6, then interval × easeFactor
    • Incorrect answer: reset to 1 day, decrease easeFactor
  • Card selection logic
    • selectCardsForSession() with priority sorting
    • Filter SUSPENDED cards
    • Priority: HARD > NORMAL > EASY
    • Sort: nextReviewDate ASC, incorrectCount DESC, consecutiveCorrect ASC
  • Wrong answer generation
    • generateWrongAnswers() selects 3 from same HSK level
    • Fisher-Yates shuffle for randomization
    • shuffleOptions() for answer position randomization
  • Unit tests (38 tests, 100% coverage)
    • Test all calculation formulas
    • Test edge cases (minimum easeFactor, large intervals, etc.)
    • Test card selection with all sorting criteria
    • Test wrong answer generation
    • 100% statement and line coverage
    • 94.11% branch coverage (exceeds 90% requirement)

Week 7-8: Learning Interface COMPLETE

  • Learning session pages
    • /learn/[collectionId] dynamic route
    • Large hanzi display (text-9xl)
    • 4 pinyin options in 2x2 grid
    • Progress bar with card count
  • Card component
    • Auto-submit after selection
    • Green/red feedback overlay
    • English meaning display
  • Answer submission
    • submitAnswer() Server Action
    • SM-2 progress updates
    • Session review tracking
  • Feedback UI
    • Correct/incorrect indicators
    • Correct answer display
    • Vocabulary meaning reinforcement
  • Session summary
    • Total cards, accuracy, duration
    • Correct/incorrect breakdown
  • Keyboard shortcuts
    • 1-4 for answer selection
    • Space to continue
  • Learning Server Actions (src/actions/learning.ts)
    • startLearningSession() - Initialize with SM-2 card selection
    • submitAnswer() - Record and update progress
    • endSession() - Calculate summary stats
    • getDueCards() - Count due cards
    • updateCardDifficulty() - Manual difficulty override
    • removeFromLearning() - Suspend card
  • Two-stage card randomization
    • Random tiebreaker during selection
    • Final shuffle for presentation
  • Navigation integration
    • Dashboard "Start Learning" button
    • Collection "Start Learning" button
  • All 38 SM-2 algorithm tests passing (98.92% coverage)

Week 9: Dashboard & Progress

  • Dashboard widgets with real statistics (due cards, total learned, daily goal, streak)
  • Progress page with charts and session history
  • Charts (Recharts) - Daily activity bar chart, accuracy trend line chart
  • Statistics Server Actions (getStatistics, getUserProgress, getLearningSessions, getHanziProgress, resetHanziProgress)
  • Recent activity section on dashboard
  • Date range filtering (7/30/90/365 days)
  • Session history table with complete details
  • Navigation links to progress page

Week 10: UI Polish

  • Responsive layouts
  • Mobile navigation
  • Dark mode
  • Loading/empty states
  • Toast notifications
  • Accessibility improvements

Week 11: Testing & Docs

  • Complete test coverage
  • E2E tests for all critical flows
  • README and documentation
  • Security audit

Week 12: Deployment

  • Production environment
  • Docker deployment
  • SSL certificates
  • Database backup
  • Import HSK data
  • Final testing

10. Docker Configuration

docker-compose.yml

version: '3.8'
services:
  nginx:
    image: nginx:alpine
    ports: ["80:80", "443:443"]
    volumes:
      - ./docker/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./docker/ssl:/etc/nginx/ssl:ro
    depends_on: [app]
  
  app:
    build: .
    expose: ["3000"]
    environment:
      - DATABASE_URL=postgresql://memohanzi_user:password@postgres:5432/memohanzi_db
      - NEXTAUTH_URL=https://yourdomain.com
      - NEXTAUTH_SECRET=${NEXTAUTH_SECRET}
    depends_on:
      postgres:
        condition: service_healthy
  
  postgres:
    image: postgres:18-alpine
    environment:
      POSTGRES_USER: memohanzi_user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: memohanzi_db
    volumes:
      - postgres-data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U hanzi_user"]

volumes:
  postgres-data:

Environment Variables

# .env.local
DATABASE_URL="postgresql://memohanzi_user:password@localhost:5432/memohanzi_db"
NEXTAUTH_URL="http://localhost:3000"
NEXTAUTH_SECRET="generate-with-openssl-rand-base64-32"
NODE_ENV="development"

11. Security Checklist

  • Passwords hashed with bcrypt (10 rounds)
  • Session tokens httpOnly, sameSite
  • CSRF protection (NextAuth.js)
  • Rate limiting (Nginx)
  • Input validation (Zod, server-side)
  • SQL injection prevented (Prisma)
  • XSS prevention (React escaping)
  • HTTPS enforced (Nginx)
  • Secure headers (Nginx)
  • Role-based access enforced server-side
  • No sensitive data in logs
  • Environment variables for secrets

12. Phase 2 Features

  1. Additional Languages - Multi-language support for meanings
  2. Learning Modes - Radical identification, hanzi-to-meaning, meaning-to-hanzi, tone practice
  3. Autocomplete Data - Auto-fill missing hanzi info from APIs
  4. User Suggestions - Allow users to report/suggest corrections

13. Phase 3 Ideas

  • Writing practice (stroke order validation)
  • Social features (public collections, sharing)
  • Gamification (streaks, badges, leaderboards)
  • Mobile apps (React Native)
  • Audio pronunciation
  • Example sentences
  • Advanced SRS algorithms

14. Quick Reference Commands

Development:

# Start
docker-compose up
npm run dev

# Database
npx prisma migrate dev
npx prisma db seed
npx prisma studio

# Testing
npm run test
npm run test:e2e

Production:

# Deploy
docker-compose up -d --build

# Monitor
docker-compose logs -f

15. Success Criteria (MVP)

Technical:

  • All tests passing (70%+ coverage)
  • Can import complete HSK vocabulary (5000+ hanzi)
  • Page load <2s
  • Learning session responsive (<100ms)
  • Mobile responsive

Functional:

  • Complete learning session works end-to-end
  • SM-2 algorithm calculates correctly
  • Progress tracking accurate
  • Collections management works
  • Search works efficiently

User Experience:

  • Can learn 20+ cards in 5-10 minutes
  • Interface intuitive
  • Daily use sustainable

Implementation Notes

Priority Order

  1. Authentication (foundational)
  2. Data import (need data)
  3. Collections (organize learning)
  4. Search (browse data)
  5. Learning algorithm (core logic)
  6. Learning interface (user interaction)
  7. Progress tracking (motivation)
  8. Polish & deploy

Critical Paths to Test

  1. Register → Login → Create Collection → Add Hanzi → Start Learning → Complete Session → View Progress
  2. Admin → Import HSK Data → Create Global Collection → User uses global collection
  3. Search Hanzi → View Detail → Add to Collection → Learn

Key Implementation Files

  • prisma/schema.prisma - All data models
  • src/lib/learning/sm2.ts - SM-2 algorithm
  • src/lib/learning/card-selector.ts - Card selection
  • src/lib/import/hsk-parser.ts - Parse HSK JSON
  • src/actions/learning.ts - Learning Server Actions
  • src/app/(app)/learn/[collectionId]/page.tsx - Learning UI

Resources


This specification is complete and ready for implementation with Claude Code.

Start with Milestone 1 (Week 1: Foundation) and proceed sequentially through the milestones.