Observability & Monitoring¶
Monitoring a cloud-native ERP system is critical for maintaining uptime and quickly diagnosing issues. Atlas ERP provides an optional, but highly recommended, integration with the Grafana observability stack.
The Observability Stack¶
We use the "LGTM" stack, primarily focusing on Loki for logs and Grafana for visualization.
| Tool | Purpose |
|---|---|
| Grafana | Visualization dashboard for logs, metrics, and traces. |
| Loki | Log aggregation system designed to be cost-effective and highly scalable. |
| Prometheus | (Optional) Metrics aggregation for deep infrastructure insights. |
Architecture¶
graph TD
NestAPI[NestJS API]
NextApp[Next.js Web App]
BullMQ[BullMQ Workers]
Winston[Winston Logger]
Loki[(Grafana Loki)]
Grafana[Grafana Dashboards]
NestAPI --> Winston
BullMQ --> Winston
NextApp -.->|Server-side logs| Winston
Winston -->|HTTP Push| Loki
Grafana -->|Query LogQL| Loki
classDef primary fill:#4f46e5,stroke:#3730a3,stroke-width:2px,color:#fff;
classDef database fill:#059669,stroke:#312e81,stroke-width:2px,color:#fff;
class NestAPI,NextApp,BullMQ primary;
class Loki database;
Logging Implementation¶
In the NestJS backend, we use winston as the custom logger. Winston is configured to write to the console (for local development) and stream logs directly to Loki (for production).
Winston Configuration¶
Structured Logging¶
Logs are formatted as JSON before being sent to Loki. This allows for powerful querying using LogQL.
Bad Log:
| Text Only | |
|---|---|
Good Log (Structured):
| JSON | |
|---|---|
By using structured logging, you can create Grafana alerts like: Count of {app="atlas-api"} |= "Authentication failed" > 50 in 5m