-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node crash on Crawlee running fs.stat on a request_queue lock file #2606
Comments
Have you really seen that on the latest beta?
That feature is about something else, what you see are local file locks in memory storage, that's an implementation detail of working with the file system. cc @vladfrangu, not sure if #2603 was supposed to help with this one too, also, doesn't this mean the lock is not acquirable and we are missing a retry somewhere? |
That PR wasn't supposed to help with that, unrelated things. I don't think I've ever seen a stat error like that ever. Also to note that it looks like the path the user provided isn't being used for the storages if the stacktrace is anything to go by, which semi hints at wrong variable passed somewhere? Also I'm semi certain we try to lock 3 times before giving up. @Clearmist can you get us a full stack trace please? 🙏 |
@B4nan, yes I am getting this issue on the @next branch. I was using latest, but moved to @next after seeing the field on the bug report form. @vladfrangu Good catch about the path being different from what I've set using the I have the request_queues directory open and see this. Are the json.lock files supposed to be seen as directories by the host OS? Maybe Node running fs.stat on a directory is the reason for the crash. I'd love to get a stack trace, but I tried these three and none of these callbacks were called.
I even tried wrapping crawler.run() in try/catch. try {
await crawler.run();
} catch (error) {
... Do you know of other ways I can generate a full stack trace when the Node process crashes? Maybe somewhere in Crawlee where I can regularly print a stack trace (if that would help). |
I updated to 3.11.3 and this issue is still present. [Error: EPERM: operation not permitted, stat]
{
errno: -4048,
code: 'EPERM',
syscall: 'stat',
path: 'C:\\Users\\{username}\\Repositories\\crawler-app\\storage\\request_queues\\nasa3d.arc.nasa.gov\\4fC3CInttKDsieR.json.lock'
} I can see that the path in the error is not where I told crawlee to store the local data. Here is my configuration object: const config = new Configuration({
storageClientOptions: {
localDataDirectory: path.join(app.getPath('userData'), 'crawlerStorage'),
},
}); What crawlee uses What I told it to use The datasets are stored in the right place, but the request_queues are being stored in the incorrect directory. Also, the .lock files are showing up as directories in Windows 10. |
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core
Issue description
The crawler, while running, will randomly crash Node. I tried using the experimental option of disabling locking, but it still happens. I doubt this is a permission issue because my user has write permission to this entire directory structure and I've also tried running as administrator.
I'm okay if I don't get the root of this issue fixed. At the least I'd like to know where I can put a try/catch so this error doesn't crash Node and the crawler can continue.
Obviously Node is trying to get file information from a lock file and dies.
Code sample
Package version
3.11.1
Node.js version
20.10.0
Operating system
Windows 10
Apify platform
I have tested this on the
next
release3.11.2-beta.17
Other context
No response
The text was updated successfully, but these errors were encountered: