Skip to content

Conversation

wangyinz
Copy link
Collaborator

I recently ran into a issue where the client may seg fault when doing IO outside of /myfs. By turning on the debugging prints, we can see that it errored out in calling the real_write function which is essentially the trampoline.

One most linux system, the __read and __write functions has the following code at the beginning:

Disassemble __read
               0 833d7dd42d0000   cmp dword [rip+0x2dd47d], 0x0
Disassemble __write
               0 833d1dd42d0000   cmp dword [rip+0x2dd41d], 0x0

but on some others, it got:

Disassemble __read
               0 488d05b1092e00   lea rax, [rip+0x2e09b1] 
               7 8b00             mov eax, [rax]          
Disassemble __write
               0 488d05e1082e00   lea rax, [rip+0x2e08e1] 
               7 8b00             mov eax, [rax]  

The original MAX_JMP_LEN (5) will truncate the code at the lea instruction, and by increasing it to 8 will include the mov. I still don't quite understand how that made a difference because the additional jmp should still be able to restore the original flow of instruction, but the change here did fix the seg fault. Maybe it means that we need to investigate more on what can or cannot be truncated in x86 instructions.

@wiliamhuang
Copy link
Collaborator

@wangyinz Sorry for my late reply. Do you have the procedure to reproduce the user's issue? Thank you!

@wangyinz
Copy link
Collaborator Author

wangyinz commented Mar 2, 2023

I think this is related to a specific glibc version. I am running ThemisIO with different containers, and this problem only occur in this one. The glibc in this container is 2.27. However, it works fine with other containers that has glibc 2.17 and 2.31. I am not sure what makes 2.27 unique.

@wiliamhuang
Copy link
Collaborator

Thank you! I will try it later.

@wangyinz
Copy link
Collaborator Author

wangyinz commented Mar 2, 2023

Thanks! On the second thought, I guess this also relates to how glibc is compiled, so it may relates to the operations system too. It is Ubuntu 18.04.5.

@wiliamhuang
Copy link
Collaborator

Right. That's possible. I have never test it on Ubuntu so far. Thank you for sharing your thoughts!

@wangyinz
Copy link
Collaborator Author

wangyinz commented Mar 2, 2023

In that case, just FYI, it runs fine on the two other containers of Ubuntu 20.04 and CentOS 7.9.2009

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants