# Running KomoonWebCrawler on cPanel / CloudLinux (shared hosting)

Mirrors the **KomoonRSSCrawler** layout on the same cPanel account.

## Server layout (webcrawler.komoon.app)

| Purpose | Server path | Public URL |
|---------|-------------|------------|
| Subdomain root | `/home/bytescorp/webcrawler.komoon.app` | `https://webcrawler.komoon.app/` |
| Node.js app | `/home/bytescorp/webcrawler.komoon.app/app` | `https://webcrawler.komoon.app/app` |
| JSON data | `/home/bytescorp/webcrawler.komoon.app/data` | `https://webcrawler.komoon.app/data/` |

- Upload this repo into **`app/`** (not the subdomain root).
- Create an empty **`data/`** folder as a **sibling** of `app/` on the server (same as RSS crawler). The crawler writes JSON there; `DATA_ROOT` defaults to `../data` relative to `app/`.
- Keep Firebase credentials in `app/config/firebase-service-account.json` (not web-accessible).

## cPanel Node.js app settings

| Setting | Value |
|---------|--------|
| Node.js version | **20.x** (e.g. 20.20.2) |
| Application mode | **Production** |
| Application root | `webcrawler.komoon.app/app` |
| Application URL | `webcrawler.komoon.app` + URI **`app`** |
| Application startup file | **`app.js`** |
| Virtualenv command | `source /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/activate && cd /home/bytescorp/webcrawler.komoon.app/app` |

> **Note:** This host uses **`nodevenv`** (not `nodeenv`) for the web crawler app.

`app.js` is a lightweight health endpoint so Passenger/cPanel can keep the app running. **Crawl work** is triggered by **cron** (`src/runBatch.js`), not by HTTP.

## First-time server setup

### Firebase Admin SDK key

The RSS crawler on this host typically stores the key under **`.config/`**, not `config/` (see RSS `.env`).

**Step 1 — find the existing key on the server:**

```bash
# What path does the RSS crawler use?
cat /home/bytescorp/rsscrawler.komoon.app/app/.env 2>/dev/null | grep FIREBASE

# Or search under your home directory
find /home/bytescorp -name '*firebase*adminsdk*.json' 2>/dev/null
find /home/bytescorp/rsscrawler.komoon.app -name '*.json' 2>/dev/null
```

**Step 2 — wire the web crawler (pick one):**

**Option A (recommended):** point web crawler `.env` at the same file RSS uses:

```bash
cd /home/bytescorp/webcrawler.komoon.app/app
cat > .env <<'EOF'
FIREBASE_PROJECT_ID=bytescorp-komoon
FIREBASE_SERVICE_ACCOUNT_PATH=.config/bytescorp-komoon-firebase-adminsdk.json
EOF
mkdir -p .config
cp /home/bytescorp/rsscrawler.komoon.app/app/.config/bytescorp-komoon-firebase-adminsdk.json .config/
chmod 600 .config/bytescorp-komoon-firebase-adminsdk.json
```

(Adjust the `cp` source if `find` shows a different path.)

**Option B:** copy into default `config/` name:

```bash
cd /home/bytescorp/webcrawler.komoon.app/app
mkdir -p config
cp /home/bytescorp/rsscrawler.komoon.app/app/.config/bytescorp-komoon-firebase-adminsdk.json config/firebase-service-account.json
chmod 600 config/firebase-service-account.json
```

**Option C:** symlink so both apps share one file:

```bash
cd /home/bytescorp/webcrawler.komoon.app/app
mkdir -p config
ln -s /home/bytescorp/rsscrawler.komoon.app/app/.config/bytescorp-komoon-firebase-adminsdk.json config/firebase-service-account.json
```

If no key exists anywhere on the server, download a new **Firebase Admin SDK** JSON from [Firebase Console](https://console.firebase.google.com/) → Project **bytescorp-komoon** → Project settings → Service accounts → Generate new private key. Upload to `app/.config/` or `app/config/` (never commit to git or expose under `public_html`).

### Data output folder

```bash
mkdir -p /home/bytescorp/webcrawler.komoon.app/data
chmod 755 /home/bytescorp/webcrawler.komoon.app/data
```

Ensure the Firestore provider doc has **`country: "MX"`** in addition to state/municipality/webURL.

## CloudLinux WASM / VMEM fix

On shared hosting, batch runs may fail at startup with:

```text
RangeError: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance
```

Set:

```bash
NODE_OPTIONS="--disable-wasm-trap-handler"
```

### Manual run (SSH)

Use the **nodevenv** virtualenv (your host uses `nodevenv`, not `nodeenv`):

```bash
source /home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/activate
cd /home/bytescorp/webcrawler.komoon.app/app

NODE_OPTIONS="--disable-wasm-trap-handler" CRAWL_COUNTRY=MX CRAWL_STATE=Zacatecas node src/runBatch.js
```

### Cron

See **[cron-jobs-hosting.md](./cron-jobs-hosting.md)** for copy-paste cron lines.

Cron must use the full Node binary from **nodevenv**:

```text
/home/bytescorp/nodevenv/webcrawler.komoon.app/app/20/bin/node
```

## Deploy

1. Upload project files to `/home/bytescorp/webcrawler.komoon.app/app`
2. Run `npm install` in that directory (SSH or cPanel terminal)
3. Place `config/firebase-service-account.json`
4. Ensure `/home/bytescorp/webcrawler.komoon.app/data` exists and is readable at `https://webcrawler.komoon.app/data/`
5. **Restart** the Node.js app in cPanel

## Reference

Same VMEM / undici constraints as KomoonRSSCrawler — see [KomoonRSSCrawler/docs/hosting.md](file:///Users/lalo/Apps/KomoonRSSCrawler/docs/hosting.md).
