You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey guys,
Just wanna say that you are on the right track!
I had Idea similar to yours - to store logs on S3 and then query them and I've spend some time analyzing it.
Maybe I'll use your tool instead of inventing bycicle on the next project :)
But now I just wanted to share some info that may be useful to you:
My motivation for building such tool was the fact that you can't store AWS logs outside of AWS, because you have to pay insane DataTransfer costs to AWS to move logs to some cheaper storage.
Because of this - all SaaS log solutions for AWS are insanely expensive.
I think that S3 is the only decent solution to cost-effectively store logs on AWS.
However - when I want to find something on those terabytes of logs on S3 - i have a problem - too much data has to be scanned.
There are some queries that simply won't leverage Parquet push-down filtering and will have to scan all data on S3.
If you have same issue - I would suggest to add a powerful feature to parseable: Lambda Scanner
You can delegate filtering to 1000s of Lambdas in parallel and be able to scan files on S3 with speeds of 100Gb/s or more.
In this case I can run cheap instance for Logs ingestion and delegate infrequent heavy query request processing to Lambda.
I've tested this already on Lambda and simple JSON parsing in Go and it works pretty well.
I was able to achieve incredible speeds with this approach and you might want to try it too.
I was able to run 500 Lambdas, each scanning 100MB compressed json file on S3 ( 273MB raw size).
Each lambda took on average 2 seconds to process the file all 500 lambdas finished in 11 seconds. (would have been 3 seconds If I would start the job from AWS server and not local machine. My network wasn't able to handle 500 simultaneous requests)
So I was able to almost instantly scan 50GB of compressed (136 GB RAW) JSON data.
And I paid for this request only $0.01.
In theory Raw JSON scanning speeds up to 250GBytes per second can be achieved with this solution.
With such power and cost-efficiency you can sell your SaaS solution to big clients with Terabytes of logs that are currently paying millions to Datadog,Splunk, etc..
The text was updated successfully, but these errors were encountered:
Hey guys,
Just wanna say that you are on the right track!
I had Idea similar to yours - to store logs on S3 and then query them and I've spend some time analyzing it.
Maybe I'll use your tool instead of inventing bycicle on the next project :)
But now I just wanted to share some info that may be useful to you:
My motivation for building such tool was the fact that you can't store AWS logs outside of AWS, because you have to pay insane DataTransfer costs to AWS to move logs to some cheaper storage.
Because of this - all SaaS log solutions for AWS are insanely expensive.
I think that S3 is the only decent solution to cost-effectively store logs on AWS.
However - when I want to find something on those terabytes of logs on S3 - i have a problem - too much data has to be scanned.
There are some queries that simply won't leverage Parquet push-down filtering and will have to scan all data on S3.
If you have same issue - I would suggest to add a powerful feature to parseable: Lambda Scanner
You can delegate filtering to 1000s of Lambdas in parallel and be able to scan files on S3 with speeds of 100Gb/s or more.
In this case I can run cheap instance for Logs ingestion and delegate infrequent heavy query request processing to Lambda.
I've tested this already on Lambda and simple JSON parsing in Go and it works pretty well.
I was able to achieve incredible speeds with this approach and you might want to try it too.
I was able to run 500 Lambdas, each scanning 100MB compressed json file on S3 ( 273MB raw size).
Each lambda took on average 2 seconds to process the file all 500 lambdas finished in 11 seconds. (would have been 3 seconds If I would start the job from AWS server and not local machine. My network wasn't able to handle 500 simultaneous requests)
So I was able to almost instantly scan 50GB of compressed (136 GB RAW) JSON data.
And I paid for this request only $0.01.
In theory Raw JSON scanning speeds up to 250GBytes per second can be achieved with this solution.
With such power and cost-efficiency you can sell your SaaS solution to big clients with Terabytes of logs that are currently paying millions to Datadog,Splunk, etc..
The text was updated successfully, but these errors were encountered: