Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

management interface: Reverted management_io to its original state to prevent the following problem: #46

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amatveyev-gh
Copy link

man_read -> ... -> man_output_list_push_finalize -> management_io ->
man_read

After some digging I found this commit made by James Yonan in 2010:
https://sourceforge.net/p/openvpn/openvpn-testing/ci/3cf6c9328250061600b78c8a7deb0edc850e739b

In this commit he put both writing to and reading from the socket into the
same "if" branch, which leads to the problem I've mentioned above.
man_read after having been called recursively goes totally nuts and screws its input
buffer, as result, openvpn can processes the same command several times
and then fail to read the rest of them correctly.

I'm not sure that what I did here is the best way to solve this problem,
but it seems to work for me.

problem:
man_read -> ... -> man_output_list_push_finalize -> management_io ->
man_read

After some digging I found this commit made by James Yonan in 2010:
https://sourceforge.net/p/openvpn/openvpn-testing/ci/3cf6c9328250061600b78c8a7deb0edc850e739b

In this commit he put both writing to and reading from the socket into the
same "if" branch, which leads to the problem I've mentioned above.
man_read after having been called recursively goes nuts and screws its input
buffer, as result openvpn can processes the same command several times
and then fail to read the rest of them correctly.

I'm not sure that what I did here is the best way to solve this problem,
but it seems to work for me.
@amatveyev-gh
Copy link
Author

Here is a small testcase I used to reproduce and debug the problem:

  1. run openvpn.exe, obviously. I ran it with --management-hold to make it wait for input, but I don't think it matters
  2. Use this little program to send a bunch of commands quickly. In my case it's just a series of alternating "state on" and "state of".
#include <stdio.h>
#include <stdlib.h>
#include <winsock2.h>

#pragma comment(lib,"ws2_32.lib") //Winsock

int main(int argc , char *argv[])
{
    WSADATA wsa;
    SOCKET s;
    struct sockaddr_in server;
    char buf[256] = { 0 };
    int received;
    int counter;

    WSAStartup(MAKEWORD(2,2), &wsa);

    s = socket(AF_INET, SOCK_STREAM, 0 );

    server.sin_addr.s_addr = inet_addr("127.0.0.1");
    server.sin_family = AF_INET;
    server.sin_port = htons(12345);

    connect(s, (struct sockaddr *)&server, sizeof(server));

    recv(s, buf, sizeof(buf), 0);
    // For some reason, without this line the bug doesn't reproduce on my machine
    printf("Received %s\n", buf); 

    for (counter = 0; counter < 10; ++counter)
    {
        const char* on = "state on\n";
        const char* off = "state off\n";
        const char* ptr;
        int result;
        if (counter % 2)
            ptr = on;
        else
            ptr = off;
        result = send(s, ptr, strlen(ptr), 0);
        printf("%d bytes were sent\n", result);
    }

    Sleep(1000*1000); // need this for debugging

    shutdown(s, SD_BOTH);
    closesocket(s);
    WSACleanup();

    return 0;
}
  1. See something like this:
    bug

Of course, as with any race condition, your ability to reproduce this bug might depend on your particular machine and/or luck.

@amatveyev-gh
Copy link
Author

And in terms of code the problem looks like this:

002fefd0 0126d7a9 openvpn!man_read+0x27
002fefe8 01271ef8 openvpn!management_io+0xc9
002feff4 0126b194 openvpn!man_output_standalone+0x28
002ff010 0126af4b openvpn!man_output_list_push_finalize+0x54
002ff054 0125201c openvpn!virtual_output_callback_func+0x17b
002ff06c 01251db3 openvpn!virtual_output_print+0x1c
002ff0c8 01251b62 openvpn!x_msg_va+0x233
002ff0e0 01270621 openvpn!x_msg+0x22
002ff124 0127086a openvpn!man_history+0xe1
002ff144 0126ea1b openvpn!man_state+0x2a
002ff17c 0126e06a openvpn!man_dispatch_command+0x7ab
002ff1fc 0126dabc openvpn!man_process_command+0x18a
002ff32c 0126d7a9 openvpn!man_read+0x16c
002ff344 01272358 openvpn!management_io+0xc9

It goes like this:

  1. openvpn receives the first "state off" and starts processing it. During processing it modifies the input buffer (man->connection.in): '\n' is removed and '\0' is added at the end of the command. But the man->connection.in->buf.offset is still 0, it points to the beginning of the buffer.
  2. While processing the command, openvpn wants to write something and goes to management_io
  3. management_io says: 'Ok, I see you want to write, but you know what, why don't you also read something and try to process it, so go to man_read'
  4. We get to man_read again, and read a few more commands that came in while we were processing the first one. But here is the thing: the input buffer is in inconsistent state, and after we add new commands to it it looks like this: "state off\0state on\n" and the offset is still 0. How many commands do we have here? I guess you can get the rest of the story from here.

@cron2
Copy link
Contributor

cron2 commented May 18, 2016

Hi,

On Tue, May 17, 2016 at 12:13:04PM -0700, Alexander Matveyev wrote:

I'm not sure that what I did here is the best way to solve this problem,
but it seems to work for me.

Thanks for the patch and the detailed description. I've asked James and
Arne to review the issue because they know the management code best
(and James is the author of the original offending commit).

gert

USENET is not the non-clickable part of WWW!
//www.muc.de/~gert/
Gert Doering - Munich, Germany [email protected]
fax: +49-89-35655025 [email protected]

@mattock
Copy link
Member

mattock commented May 19, 2016

Notifying @jamesyonan

@amatveyev-gh
Copy link
Author

Aaand, of course I made a stupid mistake when fixing the bug in my local fork. One closing '}' brace is in the wrong place. I debugged and worked on one source tree and sent you changes from another, typical me :) It doesn't change anything about the bug description, just disregard my patch, the fix is trivial anyway, you can easily do it from scratch or even decide to fix the problem in a totally different place.

@dsommers dsommers changed the title Reverted management_io to its original state to prevent the following problem: management interface: Reverted management_io to its original state to prevent the following problem: Aug 17, 2016
@ordex
Copy link
Member

ordex commented Sep 16, 2022

@schwabe is this still an issue?

@cron2
Copy link
Contributor

cron2 commented Sep 17, 2022

@selvanair might also have some insights here, having fixed one of the management recursion issues recently...

@selvanair
Copy link
Contributor

This does look related to the bug for which we merged a quick-fix (allow a few levels of recursion in virtual_output_callback(): commit 4dfd592).

However, I can't reproduce the behaviour reported here. I tried with and without my patch: in both cases repeatedly sending "state on/off" as suggested here works correctly. Also checked that the daemon does respond with "SUCCESS: " for each command (tested on Linux). This is not definitive as such race-condition/timing related bugs are hard to reproduce.

@amatveyev-gh could you test again whether this is still an issue? I'm also curious to know whether the fix proposed here could also handle the lost messages problem without allowing recursion.

In the mean time I stumbled up on commit b2b6617 which seems to indicate that allowing recursion as I did may have undesirable side effects.

@dsommers Does commit 4dfd592 break your bug-fix in commit b2b6617 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants