Teng Lin 3de9a97197 docs: reorder Testing sections for better flow

Move E2E Fixtures and Rate Limiting after VCR Testing since VCR
tests are more commonly used (no auth required) while E2E needs
more setup.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-20 22:34:14 -05:00

12 KiB

Raw Permalink Blame History

Contributing Guide

Status: Active Last Updated: 2026-01-21

This guide covers everything you need to contribute to notebooklm-py: architecture overview, testing, and releasing.

Architecture

Package Structure

src/notebooklm/
├── __init__.py          # Public exports
├── client.py            # NotebookLMClient main class
├── auth.py              # Authentication handling
├── types.py             # Dataclasses and type definitions
├── _core.py             # Core HTTP/RPC infrastructure
├── _notebooks.py        # NotebooksAPI implementation
├── _sources.py          # SourcesAPI implementation
├── _artifacts.py        # ArtifactsAPI implementation
├── _chat.py             # ChatAPI implementation
├── _research.py         # ResearchAPI implementation
├── _notes.py            # NotesAPI implementation
├── _settings.py         # SettingsAPI implementation
├── _sharing.py          # SharingAPI implementation
├── rpc/                 # RPC protocol layer
│   ├── __init__.py
│   ├── types.py         # RPCMethod enum and constants
│   ├── encoder.py       # Request encoding
│   └── decoder.py       # Response parsing
└── cli/                 # CLI implementation
    ├── __init__.py      # CLI package exports
    ├── helpers.py       # Shared utilities
    ├── session.py       # login, use, status, clear
    ├── notebook.py      # list, create, delete, rename
    ├── source.py        # source add, list, delete
    ├── artifact.py      # artifact list, get, delete
    ├── generate.py      # generate audio, video, etc.
    ├── download.py      # download audio, video, etc.
    ├── chat.py          # ask, configure, history
    └── ...

Layered Architecture

┌─────────────────────────────────────────────────────────────┐
│                         CLI Layer                           │
│   cli/session.py, cli/notebook.py, cli/generate.py, etc.    │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                      Client Layer                           │
│  NotebookLMClient → NotebooksAPI, SourcesAPI, ArtifactsAPI  │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                       Core Layer                            │
│              ClientCore → _rpc_call(), HTTP client          │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                        RPC Layer                            │
│        encoder.py, decoder.py, types.py (RPCMethod)         │
└─────────────────────────────────────────────────────────────┘

Layer Responsibilities

Layer	Files	Responsibility
CLI	`cli/*.py`	User commands, input validation, Rich output
Client	`client.py`, `_*.py`	High-level Python API, returns typed dataclasses
Core	`_core.py`	HTTP client, request counter, RPC abstraction
RPC	`rpc/*.py`	Protocol encoding/decoding, method IDs

Key Design Decisions

Why underscore prefixes? Files like _notebooks.py are internal implementation. Public API stays clean (from notebooklm import NotebookLMClient).

Why namespaced APIs? client.notebooks.list() instead of client.list_notebooks() - better organization, scales well, tab-completion friendly.

Why async? Google's API can be slow. Async enables concurrent operations and non-blocking downloads.

Adding New Features

New RPC Method:

Capture traffic (see RPC Development Guide)
Add to rpc/types.py: NEW_METHOD = "AbCdEf"
Implement in appropriate _*.py API class
Add dataclass to types.py if needed
Add CLI command if user-facing

New API Class:

Create _newfeature.py with NewFeatureAPI class
Add to client.py: self.newfeature = NewFeatureAPI(self._core)
Export types from __init__.py

Testing

Prerequisites

Install dependencies:
```
uv pip install -e ".[dev]"
```
Authenticate:
```
notebooklm login
```
Create read-only test notebook (required for E2E tests):
- Create notebook at NotebookLM
- Add multiple sources (text, URL, etc.)
- Generate artifacts (audio, quiz, etc.)
- Set env var: export NOTEBOOKLM_READ_ONLY_NOTEBOOK_ID="your-id"

Quick Reference

# Unit + integration tests (no auth needed)
pytest

# E2E tests (requires auth + test notebook)
pytest tests/e2e -m readonly        # Read-only tests only
pytest tests/e2e -m "not variants"  # Skip parameter variants
pytest tests/e2e --include-variants # All tests including variants

Test Structure

tests/
├── unit/               # No network, fast, mock everything
├── integration/        # Mocked HTTP responses + VCR cassettes
│   ├── test_vcr_*.py   # Client-level VCR tests
│   └── cli_vcr/        # CLI integration tests with VCR
└── e2e/                # Real API calls (requires auth)

VCR Testing (Recorded HTTP)

VCR tests record HTTP interactions for offline, deterministic replay. We have two levels:

Client-level VCR tests (tests/integration/test_vcr_*.py):

Test Python API methods directly
Verify RPC encoding/decoding with real responses

CLI VCR tests (tests/integration/cli_vcr/):

Test the full CLI → Client → RPC path
Use Click's CliRunner with VCR cassettes
Verify CLI commands work end-to-end without mocking the client

# Run all VCR tests
pytest tests/integration/

# Run only CLI VCR tests
pytest tests/integration/cli_vcr/

# Record new cassettes (sensitive data auto-scrubbed)
NOTEBOOKLM_VCR_RECORD=1 pytest tests/integration/test_vcr_*.py -v

Sensitive data (cookies, tokens, emails) is automatically scrubbed from cassettes.

E2E Fixtures

Fixture	Use Case
`read_only_notebook_id`	List/download existing artifacts
`temp_notebook`	Add/delete sources (auto-cleanup)
`generation_notebook_id`	Generate artifacts (CI-aware cleanup)

Rate Limiting

NotebookLM has undocumented rate limits. Generation tests may be skipped when rate limited:

Use pytest tests/e2e -m readonly for quick validation
Wait a few minutes between full test runs
SKIPPED (Rate limited by API) is expected behavior, not failure

Writing New Tests

Need network?
├── No → tests/unit/
├── Mocked → tests/integration/
└── Real API → tests/e2e/
    └── What notebook?
        ├── Read-only → read_only_notebook_id + @pytest.mark.readonly
        ├── CRUD → temp_notebook
        └── Generation → generation_notebook_id
            └── Parameter variant? → add @pytest.mark.variants

CI/CD

Workflows

Workflow	Trigger	Purpose
`test.yml`	Push/PR	Unit tests, linting, type checking
`nightly.yml`	Daily 6 AM UTC	E2E tests with real API
`rpc-health.yml`	Daily 7 AM UTC	RPC method ID monitoring (see stability.md)
`testpypi-publish.yml`	Manual dispatch	Publish to TestPyPI
`verify-package.yml`	Manual dispatch	Verify TestPyPI or PyPI install + E2E
`publish.yml`	Tag push	Publish to PyPI

Setting Up Nightly E2E Tests

Get storage state: cat ~/.notebooklm/storage_state.json
Add GitHub secrets:
- NOTEBOOKLM_AUTH_JSON: Storage state JSON
- NOTEBOOKLM_READ_ONLY_NOTEBOOK_ID: Your test notebook ID

Maintaining Secrets

Task	Frequency
Refresh credentials	Every 1-2 weeks
Check nightly results	Daily

Troubleshooting CI/CD Auth

First step: Run notebooklm auth check --json in your workflow to diagnose issues.

"NOTEBOOKLM_AUTH_JSON environment variable is set but empty"

Cause: The NOTEBOOKLM_AUTH_JSON env var is set to an empty string.

Solution:

Ensure the GitHub secret is properly configured
Check the secret isn't empty or whitespace-only
Verify the workflow syntax: ${{ secrets.NOTEBOOKLM_AUTH_JSON }}

"must contain valid Playwright storage state with a 'cookies' key"

Cause: The JSON in NOTEBOOKLM_AUTH_JSON is missing the required structure.

Solution: Ensure your secret contains valid Playwright storage state JSON:

{
  "cookies": [
    {"name": "SID", "value": "...", "domain": ".google.com", ...},
    ...
  ],
  "origins": []
}

Cause: You're trying to run notebooklm login in CI/CD where NOTEBOOKLM_AUTH_JSON is set.

Why: The login command saves to a file, which conflicts with environment-based auth.

Solution:

Don't run login in CI/CD - use the env var for auth instead
If you need to refresh auth, do it locally and update the secret

Session expired in CI/CD

Cause: Google sessions expire periodically (typically every 1-2 weeks).

Solution:

Re-run notebooklm login locally
Copy the contents of ~/.notebooklm/storage_state.json
Update your GitHub secret

Multiple accounts in CI/CD

Use separate secrets and set NOTEBOOKLM_AUTH_JSON per job:

jobs:
  account-1:
    env:
      NOTEBOOKLM_AUTH_JSON: ${{ secrets.NOTEBOOKLM_AUTH_ACCOUNT1 }}
    steps:
      - run: notebooklm list

  account-2:
    env:
      NOTEBOOKLM_AUTH_JSON: ${{ secrets.NOTEBOOKLM_AUTH_ACCOUNT2 }}
    steps:
      - run: notebooklm list

Debugging CI/CD auth issues

Add diagnostic steps to your workflow:

- name: Debug auth
  run: |
    # Comprehensive auth check (preferred)
    notebooklm auth check --json

    # Check if env var is set (without revealing content)
    if [ -n "$NOTEBOOKLM_AUTH_JSON" ]; then
      echo "NOTEBOOKLM_AUTH_JSON is set (length: ${#NOTEBOOKLM_AUTH_JSON})"
    else
      echo "NOTEBOOKLM_AUTH_JSON is not set"
    fi

The auth check --json output shows:

Whether storage/env var is being used
Which cookies are present
Cookie domains (important for regional users)
Any validation errors

Getting Help

Check existing implementations in _*.py files
Look at test files for expected structures
See RPC Development Guide for protocol details
Open an issue with captured request/response (sanitized)

12 KiB Raw Permalink Blame History

Contributing Guide

Architecture

Package Structure

Layered Architecture

Layer Responsibilities

Key Design Decisions

Adding New Features

Testing

Prerequisites

Quick Reference

Test Structure

VCR Testing (Recorded HTTP)

E2E Fixtures

Rate Limiting

Writing New Tests

CI/CD

Workflows

Setting Up Nightly E2E Tests

Maintaining Secrets

Troubleshooting CI/CD Auth

"NOTEBOOKLM_AUTH_JSON environment variable is set but empty"

"must contain valid Playwright storage state with a 'cookies' key"

"Cannot run 'login' when NOTEBOOKLM_AUTH_JSON is set"

Session expired in CI/CD

Multiple accounts in CI/CD

Debugging CI/CD auth issues

Getting Help

12 KiB

Raw Permalink Blame History