-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize window update logic #108
base: master
Are you sure you want to change the base?
Conversation
AFAICS, the lack of receive-speed in Watt-32 programs (vs. Winsock or Linux) is the lack of And FACK (Forward Acknowledgement), but not sure how common this is nowadays. There is an option |
80530a2
to
9c76a02
Compare
I read a bit about SACK before. Doesn't it only help for retransmitted segments (which should be rare)? On that topic, I noticed something. If you let Now we send a dup-ACK for every packet received after the gap. Limiting this to 3 would already save some time spent in |
BTW, if we use a giant buffer size in wget: diff --git a/bin/WGET.182/retr.c b/bin/WGET.182/retr.c
index ebe1fe5..f78aad0 100644
--- a/bin/WGET.182/retr.c
+++ b/bin/WGET.182/retr.c
@@ -141,7 +141,7 @@ get_contents (int fd, FILE *fp, long *len, long restval, long expected,
#ifdef __LARGE__
static char c[2048];
#else
- static char c[8192];
+ static char c[131072];
#endif
void *progress = NULL;
struct wget_timer *timer = wtimer_allocate (); We can get a better estimate of maximum possible throughput. The difference then becomes: Before: ~5.3 MB/s (current Which is pretty close to maximum link speed! |
Pretty impressive work there! 🥇 But all this is with a djgpp version of Wget on your LAN? |
Yes, this needs to be tested in different scenarios. I feel like I might be optimizing for one specific use case. On high-latency connections I expect it will fall apart, but haven't really tried yet. I did try building wget with watcom but it complains about the makefile syntax. I'm really not familiar with the watcom toolset at all, would appreciate some help here (if anyone else happens to listen in on this). |
Found a server on the other side of the planet (ping 290ms). $ wget http://mirror.linux.org.au/debian/dists/unstable/main/Contents-source.gz -T 0 -O /dev/null Windows (MSYS2): Constant 850 KB/s Oops :) |
What's your max WAN link-speed? I'm getting 5.09MB/s with wget/Windows (MSVC) on my 100 MBit/s WAN.
Nice. Some tricks inside Linux that does this? |
I'm on a gigabit link, so no bottleneck there. Also I tested on Windows 7, maybe 10/11 has better TCP implementation (I should upgrade at some point). For Linux I guess they have accurate round-trip timing to know when to send earlier window updates. But that's just a guess. |
Looking at Wireshark on the Linux side: It uses the TCP timestamp option and window size of 3MB. Window size does always stay constant. ACKs are sent fairly late. Also note the effect of TCP offloading, the network card bundles up multiple packets and presents them to the OS as one large block. We don't have this luxury here :) |
If that include checksum offloading, I've disabled that to see the calculated checksums in Wireshark etc.
Wmake is really weird and ugly compared to GNU-make. I'll make an |
Thinking out loud: Problem is the delay between receiving the last packet and sending a window update. We can receive packets pretty quick, but then it takes a while before we get to process it. In theory sending ACKs faster with reduced window size will allow the sender to adapt and send at a more constant rate. But that's not happening, so something on our end must be causing a fairly constant delay. If I were to redesign TCP, I would add some sort of window pre-announce mechanism. So a sender could say with the first packet "I will be transmitting N full segments now". And the receiver immediately replies "after those N segments, my window size will be at least X". Then the sender can always keep going without ever having to wait for a window update. |
Does this have something to do with Nagle's Algorithm? |
Hm, not likely. I do think the Nagle implementation is slightly wrong, but that is another topic. I've done some more experimenting. If I remove the window update treshold, the high-latency case improves to ~75 KB/s without hurting LAN too much (6.3 MB/s). Doing fast-ACK from FSM, I can get 100 KB/s (or even 200 with 64K window). But LAN really suffers. What I think is happening then: |
Limit the number of dup-ACKs to 3, and send fast-ACKs after each retransmitted segment.
To elaborate on the "slightly wrong" Nagle mode: While there is still unacked data, it's supposed to send only whole segments. But if we have a whole segment and a few bytes, Watt32 will also send a partial one. Small difference, but it is "slightly wrong". There is also a modification to Nagle's algorithm described here. Could take a look at that sometime. |
Did you use the RDTSC or the 8254 timer as you mentioned in #99? |
Oh now you're on to something. I had been using the default, which I thought should be rdtsc. I saw you can toggle it with environment var So I set I'm very confused now. Need to figure out how this works. |
This would all make sense if the timer code was using |
I did some benchmarking: #include <stdio.h>
#include <tcp.h>
static uint64_t rdtsc()
{
uint64_t count;
asm volatile ("rdtsc" : "=A" (count));
return count;
}
int main()
{
uint64_t begin, end;
init_misc();
//hires_timer(0);
begin = rdtsc();
for (int i = 0; i < 100000; ++i)
{
set_timeout(0);
}
end = rdtsc();
printf("%llu\n", end - begin);
return 0;
}
This shows that the timer code is a serious bottleneck - The default option, at least for djgpp, is equivalent to RDTSC is decently fast on average, but it still calls Question is, do we need exact UNIX/UTC time anywhere? For basic timeouts you can just start counting from 0 at program startup, and avoid all the slow time.h stuff from libc. |
Agreed. I'll try to make a modified version of |
I'm experimenting with the delayed-ACK / window update code. Some radical changes here, so this needs proper testing to ensure that it helps in all scenarios. And there may be bugs still.
RFC 1122 says:
But in our case, the packet driver's "send" function is a major bottleneck. We should avoid sending small packets at all costs. The solution: remove fast-ACK altogether, delay as long as possible.
In addition, we should never advertise a receive window smaller than MSS. Ideally, keep it wide open. So, the window should be updated only from
tcp_read()
. It makes no sense to do so from FSM or retransmitter - they can't solve the problem.In this patch I also allocate a receive buffer that is 2x the window size. Not sure yet if it's necessary.
Things to check:
tcp_Socket.recv_win
is only updated for in-order data. That is likely not correct.Test procedure with the bundled wget:
And in wattcp.cfg:
tcp.recv_win = 65536
Results so far look promising:
Before: ~4.0MB/s
After: ~6.4MB/s