# AI Data Aggregation — Automated Ergonode Product Enrichment (PoC)

A Swiss-hosted, detached PoC that enriches Ergonode (PIM) products from status
`New` to `Done` or `Review` with no manual effort. Per product it runs a
per-attribute **research → manipulation → quality-scoring** loop, sources and
generates **images**, writes accepted values back to Ergonode together with a
transparent **`ai-product-log` CSV**, and is fully governed from an **Admin
Backoffice + Observability Dashboard**.

The whole stack runs **today on deterministic mocks** — no external credentials
needed. Google **Vertex AI** sits behind a clean interface; going live is just
**filling in credentials** (see [Going live with Vertex AI](#-going-live-with-vertex-ai)).

---

## Architecture

```
                 ┌──────────────────────────┐        ┌───────────────────────┐
   Operators ───▶│  Next.js Admin frontend  │──proxy▶│   Symfony backend     │
                 │  (Backoffice + Dashboard)│        │   (FrankenPHP)        │
                 │  Google OAuth / dev login│        │                       │
                 └──────────────────────────┘        │  ┌─────────────────┐  │
                                                      │  │ Pipeline        │  │
   Ergonode ◀──poll / write / CSV / images───────────┼──│ Ingestion▸AI▸   │  │
   (GraphQL, mock or staging)                         │  │ Images▸Score▸   │  │
                                                      │  │ Push            │  │
   Vertex AI ◀──research / manipulate / score / ──────┼──│ (Story 2/3/4)   │  │
   image / segmentation / alt-text (mock or real)     │  └─────────────────┘  │
                                                      │  Postgres · Messenger │
   Slack ◀──terminal-failure alerts (mock/webhook)────┤  worker queue         │
                                                      └───────────────────────┘
```

- **backend/** — Symfony 6.4 service: Ergonode integration, AI enrichment engine,
  image pipeline, config store, Admin API, queue worker, CLI commands.
- **frontend/** — Next.js 14 (App Router, TypeScript, Tailwind): Admin Backoffice
  + Observability Dashboard, Google OAuth (+ dev login), server-side API proxy.

### Key decisions
- **Ergonode transport: REST.** The epic nominated GraphQL, but the provided
  `digtag-staging` instance only exposes the **REST API** (`/api/v1/...`; there is
  no GraphQL endpoint), confirmed against its OpenAPI spec at `/api/doc`. The real
  client (`backend/src/Ergonode/Rest/`) authenticates via `POST /api/v1/login` and
  uses the documented product/multimedia endpoints; a file-backed mock (`…/Mock/`)
  is the default so the pipeline runs offline. Auth + product listing are
  confirmed against the live spec; the exact write payloads + workflow status
  codes are validated with real credentials during the Ergonode Integration story.
- **Atomic status-claim**: Ergonode GraphQL does not guarantee optimistic
  locking, so the service uses a **local claim table** (`product_claim`) with a
  UNIQUE constraint on the product id plus a conditional `UPDATE` — a second
  worker's claim fails, guaranteeing each product transitions to `AI Research`
  exactly once. See `backend/src/Ergonode/StatusClaimService.php`.
- **Quality Score**: `BRC` is a KO gate (`BRC = 0 → AQS = 0`); otherwise
  `AQS = 0.35·TSQ + 0.35·FPV + 0.25·CSC + 0.05·VFQ`.
- **Config-driven**: every runtime knob, manipulation/image rule, trusted source,
  feature flag and image spec lives in the config store and is edited in the
  Admin Backoffice — **no redeploy**, applied on the next product.

---

## Quick start (Docker)

```bash
cp .env.example .env            # defaults run everything on mocks
docker compose up --build       # db + backend + worker + poller + frontend
```

- Admin Backoffice + Dashboard: **http://localhost:3000** (dev login: any email)
- Backend API: **http://localhost:8000/api/health**

Seed demo products and watch them flow through the pipeline:

```bash
docker compose exec backend php bin/console app:simulate --reset --count=5 --variants --edge
# the poller picks them up within ~60s; or force a cycle:
docker compose exec backend php bin/console app:poll
```

Open the Dashboard — queue depth, throughput, average AQS per attribute, cost,
failure rate and the kill-switch update live.

## Quick start (local, no Docker)

Requires PHP 8.1+ (gd+webp, intl, zip, pdo_sqlite), Composer and Node 20+.

```bash
# Backend (SQLite, mocks)
cd backend
composer install
php bin/console doctrine:schema:create
php bin/console app:seed
php bin/console app:simulate --reset --count=5 --variants --edge
php bin/console app:poll
php bin/console app:worker --limit=50 --time-limit=120     # process the queue
php -S 127.0.0.1:8000 -t public public/index.php           # API server

# Frontend (new terminal)
cd frontend
npm install
npm run dev        # http://localhost:3000  (BACKEND_INTERNAL_URL=http://127.0.0.1:8000)
```

---

## 🔑 Going live with Vertex AI

Everything is wired; only credentials are missing.

1. In Google Cloud, create a **service account** with the **Vertex AI User** role
   in a Swiss region (`europe-west6`); download its **JSON key**.
2. Save the key as **`./secrets/vertex-sa.json`** (mounted read-only into the
   backend + worker containers; git-ignored).
3. Set the **project id**, **region**, **per-step models** and flip the provider to
   **vertex** — either in `.env` (`VERTEX_PROJECT_ID`, `AI_PROVIDER=vertex`) or, with
   no redeploy, in the dashboard's **Vertex AI** page (env values are the fallback;
   the config store wins). That page also shows live **credential status** and a
   one-click **smoke test**. The service-account key is never entered in the UI.
4. Restart (if you used `.env`) and smoke-test:
   ```bash
   docker compose up -d
   docker compose exec backend php bin/console app:vertex:smoke   # or the dashboard button
   ```

The same single switch (`AI_PROVIDER=vertex`) moves research, manipulation,
scoring, alt-text, image creation and segmentation onto Vertex. The auth (RS256
service-account JWT → access token) and the REST calls are implemented in
`backend/src/Ergonode/.../Vertex/`. To connect **real Ergonode staging** instead
of the mock, set `ERGONODE_MODE=real` + `ERGONODE_USERNAME/PASSWORD`. To use the
real **Slack** module, set `SLACK_MODE=webhook` + `SLACK_WEBHOOK_URL`.

---

## CLI commands (backend)

| Command | What it does |
| --- | --- |
| `app:seed [--force]` | Seed config-store defaults + reference data (idempotent). |
| `app:simulate [--count] [--variants] [--edge] [--reset]` | Create synthetic `New` products in the mock Ergonode (demos, QA, load). |
| `app:poll` | One polling cycle: claim `New`/changed products, enqueue them. |
| `app:worker [--limit] [--time-limit]` | Consume the queue (runs the full pipeline). |
| `app:vertex:smoke` | Smoke-test the active AI provider (mock or Vertex). |

## Admin Backoffice (Story 5)

`Dashboard · Manipulation Rules (incl. Image Specs) · Per-Category Rules ·
Trusted Sources · Vertex AI · Runtime Settings · Feature Flags · Audit Log`

The **Vertex AI** page manages the provider mode (mock ⇄ vertex), GCP project,
region and per-step models, shows credential status (never the key) and runs a
smoke test — all applied without a redeploy.

- All writes are **versioned** (new active version, previous read-only) and
  **audited** (actor, target, before/after).
- **Feature flags** gate each pipeline phase (a disabled phase is a no-op and
  logs `phase_skipped`); **kill-switch** halts new AI work when the daily cost
  ceiling is reached (in-flight products finish) and can be reset from the
  Dashboard.
- Screens use semantic markup, labelled controls, visible focus and an
  AA-contrast palette (WCAG 2.2 AA); add `axe-core` in CI per Story 6.

## `ai-product-log` CSV schema

UTF-8 with header, one row per attribute and per image, **sorted by AQS
ascending**, named `ai-product-log_{productId}_{ISO8601}.csv`:

```
row_type, attribute_name|image_position, candidate_value|image_filename,
aqs, brc, tsq, fpv, csc, vfq, source_url, source_excerpt, model_id,
prompt_version, manipulation_rule_version, alt_text_de, alt_text_en,
alt_text_fr, alt_text_it, rejection_reason, timestamp
```

---

## Tests

```bash
cd backend && vendor/bin/phpunit          # unit + functional (deterministic mocks)
cd frontend && npm run build              # type-check + production build
```

See `backend/GO_LIVE_CHECKLIST.md` for the Story 6 readiness checklist.

## Status vs. the Epic

| Story | Status |
| --- | --- |
| 1 — Foundation, Vertex wrapper, Slack contract | ✅ (Vertex needs credentials) |
| 2 — Ergonode integration end-to-end | ✅ (GraphQL client ready; mock default) |
| 3 — AI enrichment engine | ✅ |
| 4 — Image pipeline (sourcing + always-on creation + normalisation) | ✅ |
| 5 — Admin Backoffice + Observability Dashboard | ✅ |
| 6 — QA suite & pilot | ✅ core suite + checklist; 48h load run is operational |

### PoC notes
- The mock AI/segmentation/Ergonode/Slack are deterministic so the pipeline and
  QA run offline; swapping in the real services is config-only.
- Segmentation of a white-object-on-white-background is satisfied via the real
  Vertex segmentation model; the mock honours a pre-masked fixture for the
  deterministic test.
- Docker images target PHP 8.3 / Node 22; the code is verified locally on PHP 8.1.