Skip to content

Commit 523443e

Browse files
authored
fix: aggregator should return 413 (Content Too Large) to make ingestor ARC work properly (#461)
500 errors will cause ingestors to slow down due to ARC (https://vector.dev/docs/reference/configuration/sinks/http/#request.adaptive_concurrency), which is good. However, for case like 413 (due to aggregator payload size limit), ingestors shouldn't slow down and be blocked. The fix here is to return the original status code so ARC will tune up the concurrency properly Before: Events buffer will be queued up and ingestors will be blocked (concurrency stays) <img width="1170" alt="Screenshot 2024-07-08 at 4 46 10 PM" src="https://github.com/hyperdxio/hyperdx/assets/5959690/a149092a-b062-4508-b62b-a85116ac74b2"> Now: Events buffer will be cleared out properly (concurrency increases) <img width="1237" alt="Screenshot 2024-07-08 at 4 43 22 PM" src="https://github.com/hyperdxio/hyperdx/assets/5959690/7871593b-0e2f-4a1e-960c-c27d90869514"> TODO: we want to revisit the error global handler to make sure the proper status code is returned
1 parent 16c0796 commit 523443e

File tree

4 files changed

+44
-8
lines changed

4 files changed

+44
-8
lines changed

.changeset/honest-ears-allow.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
'@hyperdx/api': patch
3+
'@hyperdx/app': patch
4+
---
5+
6+
fix: aggregator should return 413 (Content Too Large) to make ingestor ARC work
7+
properly

docker-compose.ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ services:
8181
# ports:
8282
# - 9000:9000
8383
environment:
84+
AGGREGATOR_PAYLOAD_SIZE_LIMIT: '64mb'
8485
APP_TYPE: 'api'
8586
CLICKHOUSE_HOST: http://ch_server:8123
8687
CLICKHOUSE_PASSWORD: api

packages/api/src/aggregator-app.ts

Lines changed: 35 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import { recordException } from '@hyperdx/node-opentelemetry';
12
import compression from 'compression';
23
import type { NextFunction, Request, Response } from 'express';
34
import express from 'express';
@@ -10,6 +11,10 @@ import routers from './routers/aggregator';
1011
import { BaseError, StatusCode } from './utils/errors';
1112
import logger, { expressLogger } from './utils/logger';
1213

14+
if (!config.AGGREGATOR_PAYLOAD_SIZE_LIMIT) {
15+
throw new Error('AGGREGATOR_PAYLOAD_SIZE_LIMIT is not defined');
16+
}
17+
1318
const app: express.Application = express();
1419

1520
const healthCheckMiddleware = async (
@@ -51,13 +56,35 @@ app.use('/', healthCheckMiddleware, routers.rootRouter);
5156
// ---------------------------------------------------------
5257

5358
// error handling
54-
app.use((err: BaseError, _: Request, res: Response, next: NextFunction) => {
55-
logger.error({
56-
location: 'appErrorHandler',
57-
error: serializeError(err),
58-
});
59-
// WARNING: should always return 500 so the ingestor will queue logs
60-
res.status(StatusCode.INTERNAL_SERVER).send('Something broke!');
61-
});
59+
app.use(
60+
(
61+
err: BaseError & { status?: number },
62+
_: Request,
63+
res: Response,
64+
next: NextFunction,
65+
) => {
66+
void recordException(err, {
67+
mechanism: {
68+
type: 'generic',
69+
handled: false,
70+
},
71+
});
72+
73+
// TODO: REPLACED WITH recordException once we enable tracing SDK
74+
logger.error({
75+
location: 'appErrorHandler',
76+
error: serializeError(err),
77+
});
78+
79+
// TODO: for all non 5xx errors
80+
if (err.status === StatusCode.CONTENT_TOO_LARGE) {
81+
// WARNING: return origin status code so the ingestor will tune up the adaptive concurrency properly
82+
return res.sendStatus(err.status);
83+
}
84+
85+
// WARNING: should always return 500 so the ingestor will queue logs
86+
res.status(StatusCode.INTERNAL_SERVER).send('Something broke!');
87+
},
88+
);
6289

6390
export default app;

packages/api/src/utils/errors.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
export enum StatusCode {
22
BAD_REQUEST = 400,
33
CONFLICT = 409,
4+
CONTENT_TOO_LARGE = 413,
45
FORBIDDEN = 403,
56
INTERNAL_SERVER = 500,
67
NOT_FOUND = 404,

0 commit comments

Comments
 (0)