Skip to content
This repository was archived by the owner on Jul 3, 2025. It is now read-only.

feat: implement DNS over QUIC#18

Merged
bassosimone merged 16 commits intorbmk-project:mainfrom
roopeshsn:feat-DoQ
Mar 8, 2025
Merged

feat: implement DNS over QUIC#18
bassosimone merged 16 commits intorbmk-project:mainfrom
roopeshsn:feat-DoQ

Conversation

@roopeshsn
Copy link
Contributor

@roopeshsn roopeshsn commented Dec 26, 2024

Implemented DNS over QUIC.

Part of https://github.com/rbmk-project/issues/issues/3

@roopeshsn
Copy link
Contributor Author

Hi @bassosimone! I made changes as per RFC 9250. Could you test once?

One more change is pending. The edns-tcp-keepalive option (section 5.5.2). This option should not be set. If we receive this option as an argument in NewQuery function we need to handle it.

@bassosimone
Copy link
Member

bassosimone commented Dec 27, 2024

Nice, it seems the code is now working as intended with dns0.eu! I would like to suggest we collect some more DoQ servers and check whether the code is also working as intended with them (the intent being to have more data points).

@bassosimone
Copy link
Member

bassosimone commented Dec 27, 2024

More specifically, I think there are two criteria we should meet for merging an MVP of DNS over QUIC:

  • the code works with 3 public DoQ servers
  • the code feels okay with respect to the RFC

I will allocate some time to study the RFC and review the code over the weekend!

Thank you for working on DoQ, @roopeshsn! 🙏 ✨

@roopeshsn
Copy link
Contributor Author

I tried to resolve the following domain,

var (
	serverAddr = flag.String("server", "dns0.eu:853", "DNS server address")
	domain     = flag.String("domain", "www.roopeshsn.com", "Domain to query")
	qtype      = flag.String("type", "A", "Query type (A, AAAA, CNAME, etc.)")
	protocol   = flag.String("protocol", "doq", "DNS protocol (udp, tcp, dot, doh)")
)

I got the following error,

roopesh@Roopeshs-MacBook-Pro dnscore % go run internal/cmd/transport/main.go
;; Query:
;; opcode: QUERY, status: NOERROR, id: 19161
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

{"time":"2024-12-27T15:36:21.801539+05:30","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkQAAAAAAAAAA==","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T15:36:21.801471+05:30","protocol":""}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

;; AUTHORITY SECTION:
roopeshsn.com.  1800    IN      SOA     johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
        /Users/roopesh/Desktop/projects/dnscore/internal/cmd/transport/main.go:74 +0x57c
exit status 2

I'll look into this.

@roopeshsn
Copy link
Contributor Author

More specifically, I think there are two criteria we should meet for merging an MVP of DNS over QUIC:

  • the code works with 3 public DoQ servers
  • the code feels okay with respect to the RFC

I will allocate some time to study the RFC and review the code over the weekend!

Thank you for working on DoQ, @roopeshsn! 🙏 ✨

Thanks to you!

@roopeshsn
Copy link
Contributor Author

I tried to resolve the following domain,

var (
	serverAddr = flag.String("server", "dns0.eu:853", "DNS server address")
	domain     = flag.String("domain", "www.roopeshsn.com", "Domain to query")
	qtype      = flag.String("type", "A", "Query type (A, AAAA, CNAME, etc.)")
	protocol   = flag.String("protocol", "doq", "DNS protocol (udp, tcp, dot, doh)")
)

I got the following error,

roopesh@Roopeshs-MacBook-Pro dnscore % go run internal/cmd/transport/main.go
;; Query:
;; opcode: QUERY, status: NOERROR, id: 19161
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

{"time":"2024-12-27T15:36:21.801539+05:30","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkQAAAAAAAAAA==","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T15:36:21.801471+05:30","protocol":""}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.     IN       A

;; AUTHORITY SECTION:
roopeshsn.com.  1800    IN      SOA     johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
        /Users/roopesh/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
        /Users/roopesh/Desktop/projects/dnscore/internal/cmd/transport/main.go:74 +0x57c
exit status 2

I'll look into this.

From response.go file, I came to know that if a domain is not valid, then we'll get an error with a suffix "no such host". This is expected, correct? But in this case, roopeshsn.com is valid. So dns0.eu is not able to resolve this domain correct?

@bassosimone
Copy link
Member

bassosimone commented Dec 27, 2024

From response.go file, I came to know that if a domain is not valid, then we'll get an error with a suffix "no such host". This is expected, correct?

Let me try and run the same command you were running with other protocols.

So, I tried using DNS-over-TLS first:

% ./transport -protocol dot -domain www.roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 48820
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 4096
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

panic: Try1: EOF

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try1[...](...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:40
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:67 +0x664

It's curious I get an EOF error here. I wonder whether this is the expected result—I'll check more in details over the weekend. (Side note: panicking is not good in general for a command line tool, but this tool us just meant for quick and dirty testing, so panicking is fine in this case.)

Then I tried using DNS-over-UDP:

% ./transport -protocol udp -domain www.roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 24191
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:32:16.363934+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"Xn8BAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkE0AAAAAAAAA==","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t":"2024-12-27T13:32:16.363844+01:00","protocol":"udp"}
{"time":"2024-12-27T13:32:16.431423+01:00","level":"INFO","msg":"dnsResponse","localAddr":"192.168.1.113:64193","dnsRawQuery":"Xn8BAAABAAAAAAABA3d3dwlyb29wZXNoc24DY29tAAABAAEAACkE0AAAAAAAAA==","dnsRawResponse":"Xn+BgwABAAAAAQABA3d3dwlyb29wZXNoc24DY29tAAABAAHAEAAGAAEAAAcIADQJam9obmF0aGFuAm5zCmNsb3VkZmxhcmXAGgNkbnPAPIya3NwAACcQAAAJYAAJOoAAAAcIAAApAgAAAAAAAAA=","remoteAddr":"8.8.8.8:53","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t0":"2024-12-27T13:32:16.363844+01:00","t":"2024-12-27T13:32:16.4314+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NXDOMAIN, id: 24191
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 512

;; QUESTION SECTION:
;www.roopeshsn.com.	IN	 A

;; AUTHORITY SECTION:
roopeshsn.com.	1800	IN	SOA	johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800


panic: Try0: no such host

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try0(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:35
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:74 +0x57c

This result seems to suggest www.roopeshsn.com does not exist. Such a result is further confirmed by:

% dig www.roopeshsn.com

; <<>> DiG 9.10.6 <<>> www.roopeshsn.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 13121
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;www.roopeshsn.com.		IN	A

;; AUTHORITY SECTION:
roopeshsn.com.		1800	IN	SOA	johnathan.ns.cloudflare.com. dns.cloudflare.com. 2358959324 10000 2400 604800 1800

;; Query time: 88 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Fri Dec 27 13:35:47 CET 2024
;; MSG SIZE  rcvd: 110

(See how the answer is NXDOMAIN, which is == no such host.)

But in this case, roopeshsn.com is valid. So dns0.eu is not able to resolve this domain correct?

So, roopeshsn.com seems to be a valid domain:

% ./transport -protocol udp -domain roopeshsn.com
;; Query:
;; opcode: QUERY, status: NOERROR, id: 33460
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:37:07.718682+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"grQBAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKQTQAAAAAAAA","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t":"2024-12-27T13:37:07.718669+01:00","protocol":"udp"}
{"time":"2024-12-27T13:37:07.775325+01:00","level":"INFO","msg":"dnsResponse","localAddr":"192.168.1.113:51059","dnsRawQuery":"grQBAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKQTQAAAAAAAA","dnsRawResponse":"grSBgAABAAIAAAABCXJvb3Blc2hzbgNjb20AAAEAAcAMAAEAAQAAAB0ABGgVYAfADAABAAEAAAAdAASsQ5YdAAApAgAAAAAAAAA=","remoteAddr":"8.8.8.8:53","serverAddr":"8.8.8.8:53","serverProtocol":"udp","t0":"2024-12-27T13:37:07.718669+01:00","t":"2024-12-27T13:37:07.77529+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 33460
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 512

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

;; ANSWER SECTION:
roopeshsn.com.	29	IN	A	104.21.96.7
roopeshsn.com.	29	IN	A	172.67.150.29

However, www.ropeshsn.com does not seem to exist. Is this your personal domain? Is it possible that you have just added the www. subdomain and it has not yet propagated through the DNS system? From my vantage point, only the base domain ropeshsn.com seem to exist. 🤔

Anyway, back onto the DoQ topic, I think the DoQ code is working as intended, since it shows NXDOMAIN where other protocols are showing NXDOMAIN and (through other tests such as the following) returns consistent answers:

% ./transport -protocol doq -domain roopeshsn.com -server dns0.eu:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 31070
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:39:57.268949+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKRAAAAAAAAAA","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2024-12-27T13:39:57.268892+01:00","protocol":""}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

;; ANSWER SECTION:
roopeshsn.com.	300	IN	A	172.67.150.29
roopeshsn.com.	300	IN	A	104.21.96.7

However, I am still a bit confused by the following:

% ./transport -protocol doq -domain roopeshsn.com -server 1.1.1.1:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 25618
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;roopeshsn.com.	IN	 A

{"time":"2024-12-27T13:40:42.660301+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABCXJvb3Blc2hzbgNjb20AAAEAAQAAKRAAAAAAAAAA","serverAddr":"1.1.1.1:853","serverProtocol":"doq","t":"2024-12-27T13:40:42.660283+01:00","protocol":""}
panic: Try1: timeout: no recent network activity

goroutine 1 [running]:
github.com/rbmk-project/common/runtimex.PanicOnError(...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:21
github.com/rbmk-project/common/runtimex.Try1[...](...)
	/Users/simone/go/pkg/mod/github.com/rbmk-project/common@v0.16.0/runtimex/runtimex.go:40
main.main()
	/Users/simone/src/github.com/rbmk-project/dnscore/internal/cmd/transport/main.go:67 +0x664

I would have expected 1.1.1.1 to implement DoQ, but maybe it does not. I wonder which other public DNS servers implement DoQ. I suppose there must be an online list, and I guess trying with a few of them is necessary to ensure the client code you wrote is working as intended.

Copy link
Member

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for preparing this initial pull request! I have finally managed to grok the related RFC, which means I am able to provide suggestions regarding improving this work. Please, take a look and let me know what you think! 🙏 🚀

@roopeshsn
Copy link
Contributor Author

Hi @bassosimone! Thanks for taking the time to review my PR! I'll go through your comments and make the changes this weekend.

Changed queryStream's method signature to dnsStream to accept both net.Conn and quic.Stream. Created a wrapper for quic.Stream to add two additional methods, LocalAddr and RemoteAddr.
@roopeshsn
Copy link
Contributor Author

Hi @bassosimone! Apologies for the delay. Due to some personal matters, I couldn't work on this earlier. I've resumed now.

@bassosimone
Copy link
Member

Hi @roopeshsn, no worries! I am taking a look at your changes now! 💪

@bassosimone
Copy link
Member

bassosimone commented Feb 15, 2025

@roopeshsn I am starting to wonder whether dns0.eu:853 is working as intended.

I tried using dns.adguard.com:853 and got the following output instead:

% go run ./internal/cmd/transport -server dns.adguard.com:853

;; Query:
;; opcode: QUERY, status: NOERROR, id: 31909
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T12:10:23.725996+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t":"2025-02-15T12:10:23.725935+01:00","protocol":"udp"}
{"time":"2025-02-15T12:10:25.775133+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:55076","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","dnsRawResponse":"AACBgAABAAYAAAABB2V4YW1wbGUDY29tAAAcAAHADAAcAAEAAAA8ABAmABQGOgAAIQAAAAAXPi5mwAwAHAABAAAAPAAQJgAUBrwAAFMAAAAAuB6UyMAMABwAAQAAADwAECYAFAa8AABTAAAAALgelM7ADAAcAAEAAAA8ABAmABQI7AAANgAAAAAXNn8kwAwAHAABAAAAPAAQJgAUCOwAADYAAAAAFzZ/McAMABwAAQAAADwAECYAFAY6AAAhAAAAABc+LmUAACkAAAAAAAAAAA==","remoteAddr":"94.140.14.14:853","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t0":"2025-02-15T12:10:23.725935+01:00","t":"2025-02-15T12:10:25.775105+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 0

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	60	IN	AAAA	2600:1406:3a00:21::173e:2e66
example.com.	60	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	60	IN	AAAA	2600:1406:bc00:53::b81e:94ce
example.com.	60	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	60	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	60	IN	AAAA	2600:1406:3a00:21::173e:2e65

I also tried with dns.alidns.com:

% go run ./internal/cmd/transport -server dns.alidns.com:853

;; Query:
;; opcode: QUERY, status: NOERROR, id: 46355
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T12:19:16.205066+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","serverAddr":"dns.alidns.com:853","serverProtocol":"doq","t":"2025-02-15T12:19:16.204977+01:00","protocol":"udp"}
{"time":"2025-02-15T12:19:16.34147+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:64024","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","dnsRawResponse":"AACBAAABAAYAAAABB2V4YW1wbGUDY29tAAAcAAEHZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAY6AAAhAAAAABc+LmYHZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAa8AABTAAAAALgelMgHZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAa8AABTAAAAALgelM4HZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAjsAAA2AAAAABc2fyQHZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAjsAAA2AAAAABc2fzEHZXhhbXBsZQNjb20AABwAAQAAADQAECYAFAY6AAAhAAAAABc+LmUAACkE0AAAAAAAXwAMAFsAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA","remoteAddr":"223.6.6.6:853","serverAddr":"dns.alidns.com:853","serverProtocol":"doq","t0":"2025-02-15T12:19:16.204977+01:00","t":"2025-02-15T12:19:16.341451+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232
; PADDING: 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	52	IN	AAAA	2600:1406:3a00:21::173e:2e66
example.com.	52	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	52	IN	AAAA	2600:1406:bc00:53::b81e:94ce
example.com.	52	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	52	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	52	IN	AAAA	2600:1406:3a00:21::173e:2e65

I also tried with comss.dns.controld.com:

% go run ./internal/cmd/transport -server comss.dns.controld.com:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 28493
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T12:23:03.221813+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","serverAddr":"comss.dns.controld.com:853","serverProtocol":"doq","t":"2025-02-15T12:23:03.221705+01:00","protocol":"udp"}
{"time":"2025-02-15T12:23:03.357736+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:49719","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","dnsRawResponse":"AACBgAABAAYAAAABB2V4YW1wbGUDY29tAAAcAAHADAAcAAEAAAEsABAmABQGvAAAUwAAAAC4HpTIwAwAHAABAAABLAAQJgAUCOwAADYAAAAAFzZ/JMAMABwAAQAAASwAECYAFAY6AAAhAAAAABc+LmXADAAcAAEAAAEsABAmABQI7AAANgAAAAAXNn8xwAwAHAABAAABLAAQJgAUBjoAACEAAAAAFz4uZsAMABwAAQAAASwAECYAFAa8AABTAAAAALgelM4AACkQAAAAAAAACwAIAAcAARgAuSjq","remoteAddr":"76.76.2.22:853","serverAddr":"comss.dns.controld.com:853","serverProtocol":"doq","t0":"2025-02-15T12:23:03.221705+01:00","t":"2025-02-15T12:23:03.35771+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 4096
; SUBNET: 185.40.234.0/24/0

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	300	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	300	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	300	IN	AAAA	2600:1406:3a00:21::173e:2e65
example.com.	300	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	300	IN	AAAA	2600:1406:3a00:21::173e:2e66
example.com.	300	IN	AAAA	2600:1406:bc00:53::b81e:94ce

I will continue investigating more resolvers from this list: https://adguard-dns.io/kb/general/dns-providers/.

I will also need to re-read the RFC but I think we're now doing things correctly.

I will also see whether I am able to install DoQ enabled servers ~locally to test against them.

@bassosimone
Copy link
Member

Previously, I wrote:

@roopeshsn I am starting to wonder whether dns0.eu:853 is working as intended.

But, after a bit I had an a-ha moment. The current diff does not correctly close the stream. We need this instead:

diff --git a/dotcp.go b/dotcp.go
index 7d84afa..8d5a3c7 100644
--- a/dotcp.go
+++ b/dotcp.go
@@ -17,7 +17,6 @@ import (
        "math"

        "github.com/miekg/dns"
-       "github.com/quic-go/quic-go"
 )

 // queryTCP implements [*Transport.Query] for DNS over TCP.
@@ -101,7 +100,7 @@ func (t *Transport) queryStream(ctx context.Context,
        // The client MUST send the DNS query over the selected stream and MUST
        // indicate through the STREAM FIN mechanism that no further data will
        // be sent on that stream.
-       if _, ok := conn.(quic.Stream); ok {
+       if _, ok := conn.(*quicStreamWrapper); ok {
                _ = conn.Close()
        }

The issue is the following: the dnsStream does not implement quic.Stream, since many methods are missing. What we actually need is to ask the question whether the dnsStream is actually a *quicStreamWrapper.

With this change applied, we have:

% go run ./internal/cmd/transport -server dns0.eu:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 2576
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 4096
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T13:20:26.374317+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"ChABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","serverAddr":"dns0.eu:853","serverProtocol":"doq","t":"2025-02-15T13:20:26.374253+01:00","protocol":"udp"}
{"time":"2025-02-15T13:20:26.576597+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:60758","dnsRawQuery":"ChABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","dnsRawResponse":"ChCBoAABAAcAAAABB2V4YW1wbGUDY29tAAAcAAHADAAcAAEAAAEsABAmABQI7AAANgAAAAAXNn8kwAwAHAABAAABLAAQJgAUCOwAADYAAAAAFzZ/McAMABwAAQAAASwAECYAFAY6AAAhAAAAABc+LmXADAAcAAEAAAEsABAmABQGOgAAIQAAAAAXPi5mwAwAHAABAAABLAAQJgAUBrwAAFMAAAAAuB6UyMAMABwAAQAAASwAECYAFAa8AABTAAAAALgelM7ADAAuAAEAAAEsAF8AHA0CAAABLGfKKnpnrsUYNM0HZXhhbXBsZQNjb20ATrTUEtgaq2o7oUaVGC6D+xiJ1Ep3quAcokD7UE8qEmXkXhfKIAIm7XbTHY80tNFT615rb9JY0jJJCx5gJPvB6gAAKQTQAACAAAAA","remoteAddr":"192.71.244.75:853","serverAddr":"dns0.eu:853","serverProtocol":"doq","t0":"2025-02-15T13:20:26.374253+01:00","t":"2025-02-15T13:20:26.576563+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 2576
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 1232

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	300	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	300	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	300	IN	AAAA	2600:1406:3a00:21::173e:2e65
example.com.	300	IN	AAAA	2600:1406:3a00:21::173e:2e66
example.com.	300	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	300	IN	AAAA	2600:1406:bc00:53::b81e:94ce
example.com.	300	IN	RRSIG	AAAA 13 2 300 20250306230634 20250214042248 13517 example.com. TrTUEtgaq2o7oUaVGC6D+xiJ1Ep3quAcokD7UE8qEmXkXhfKIAIm7XbTHY80tNFT615rb9JY0jJJCx5gJPvB6g==

What's more, if you check the timings I posted in #18 (comment), you would see (with many log messages snipped):

% go run ./internal/cmd/transport -server dns.adguard.com:853

[...]

{"time":"2025-02-15T12:10:23.725996+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t":"2025-02-15T12:10:23.725935+01:00","protocol":"udp"}
{"time":"2025-02-15T12:10:25.775133+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:55076","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAAAAAAA==","dnsRawResponse":"AACBgAABAAYAAAABB2V4YW1wbGUDY29tAAAcAAHADAAcAAEAAAA8ABAmABQGOgAAIQAAAAAXPi5mwAwAHAABAAAAPAAQJgAUBrwAAFMAAAAAuB6UyMAMABwAAQAAADwAECYAFAa8AABTAAAAALgelM7ADAAcAAEAAAA8ABAmABQI7AAANgAAAAAXNn8kwAwAHAABAAAAPAAQJgAUCOwAADYAAAAAFzZ/McAMABwAAQAAADwAECYAFAY6AAAhAAAAABc+LmUAACkAAAAAAAAAAA==","remoteAddr":"94.140.14.14:853","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t0":"2025-02-15T12:10:23.725935+01:00","t":"2025-02-15T12:10:25.775105+01:00","protocol":"udp"}

[...]

See how we receive the response at 12:10:25.775 and we send the query at 12:10:23.725. That is roughly two seconds, which I did not observe with other servers. However, it's informative: we should not have a two seconds wait for a response and look at what happens if I re-run the above command with the diff applied:

% go run ./internal/cmd/transport -server dns.adguard.com:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 37973
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 4096
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T13:24:26.491278+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"lFUBAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t":"2025-02-15T13:24:26.491177+01:00","protocol":"udp"}
{"time":"2025-02-15T13:24:26.596521+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:55462","dnsRawQuery":"lFUBAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","dnsRawResponse":"lFWBoAABAAcAAAABB2V4YW1wbGUDY29tAAAcAAHADAAcAAEAAABwABAmABQGOgAAIQAAAAAXPi5lwAwAHAABAAAAcAAQJgAUBjoAACEAAAAAFz4uZsAMABwAAQAAAHAAECYAFAa8AABTAAAAALgelMjADAAcAAEAAABwABAmABQGvAAAUwAAAAC4HpTOwAwAHAABAAAAcAAQJgAUCOwAADYAAAAAFzZ/JMAMABwAAQAAAHAAECYAFAjsAAA2AAAAABc2fzHADAAuAAEAAABwAF8AHA0CAAABLGfKKnpnrsUYNM0HZXhhbXBsZQNjb20ATrTUEtgaq2o7oUaVGC6D+xiJ1Ep3quAcokD7UE8qEmXkXhfKIAIm7XbTHY80tNFT615rb9JY0jJJCx5gJPvB6gAAKQAAAAAAAAAjAAwAHwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","remoteAddr":"94.140.14.14:853","serverAddr":"dns.adguard.com:853","serverProtocol":"doq","t0":"2025-02-15T13:24:26.491177+01:00","t":"2025-02-15T13:24:26.596459+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 37973
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 7, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 0
; PADDING: 00000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	112	IN	AAAA	2600:1406:3a00:21::173e:2e65
example.com.	112	IN	AAAA	2600:1406:3a00:21::173e:2e66
example.com.	112	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	112	IN	AAAA	2600:1406:bc00:53::b81e:94ce
example.com.	112	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	112	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	112	IN	RRSIG	AAAA 13 2 300 20250306230634 20250214042248 13517 example.com. TrTUEtgaq2o7oUaVGC6D+xiJ1Ep3quAcokD7UE8qEmXkXhfKIAIm7XbTHY80tNFT615rb9JY0jJJCx5gJPvB6g==

Now we send the query at 13:24:26.491 and we receive the response at 13:24:26.596, which is ~100ms.

Based on all of this, here's my analysis of what is going on:

  1. some servers (e.g., comss.dns.controld.com:853) don't care whether we close the stream for writing (as we ought to do) and just return the answer
  2. other servers (e.g., dns.adguard.com:853) have a ~2s timeout for receiving the close-for-writing signal and eventually send a response anyway
  3. other servers (e.g., dns0.eu:853) require a close-for-writing signal and otherwise close the connection

We might want to abstract the check slightly more (i.e., check for an interface rather than checking for *quicStreamWrapper), or maybe YAGNI and just checking for the concrete type is fine. At the end of the day, while being general is important, this is a private type, therefore it's probably fine to avoid abstracting until we need abstraction.

I am now going to test with all the DoQ servers I can find and report back in a subsequent message.

However, I think this finding is already good enough to warrant some success emojis: 🥳 🥳 🥳 🥳

@bassosimone
Copy link
Member

bassosimone commented Feb 15, 2025

So, here's the results of testing with all the DoQ servers in this list: https://adguard-dns.io/kb/general/dns-providers/. I am also going to try and test additional DoQ servers that I may find or guess. I am going to just report back whether we did receive a response or not using emojis. I will add logs only if necessary.

The command I am using is go run ./internal/cmd/transport -server $server where $server is the one indicating in the first column of the following table:

Server Endpoint Result
dns.adguard-dns.com:853 ✔️
family.adguard-dns.com:853 ✔️
unfiltered.adguard-dns.com:853 ✔️
dns.alidns.com:853
dns.de.futuredns.eu.org:853 ✔️
dns.us.futuredns.eu.org:853 ✔️
zero.dns0.eu:853 ✔️
comss.dns.controld.com:853 ✔️
dns.jupitrdns.com:853 ✔️
dandelionsprout.asuscomm.com:48582
ibksturm.synology.me:853 ✔️
doh.tiar.app:853 ✔️
rx.techomespace.com:853
dns.nextdns.io:853 ✔️

The error with dns.alidns.com:853 is Application error 0x2 (remote). It's unclear to me why this happens.

For dandelionsprout.asuscomm.com, I cannot properly resolve the domain (see: https://dns.google/query?name=dandelionsprout.asuscomm.com).

With rx.techomespace.com:853, I get a timeout with "no recent network activity".

@roopeshsn Would you mind applying the patch suggested above and repeat my test to see if you see the same results?

@bassosimone
Copy link
Member

OK, @roopeshsn, I also figured out what's going on with dns.alidns.com. The issue is that the requirement for the query ID being zero should be applied basically on query creation. This DoQ server seems to be strict and rejects queries whose ID is not zero, while others are more lenient. The actual error stemmed from another bit of diff that I had locally, where I was trying to copy/clone the query to avoid modifying it and causing a data race. In any case, reflecting on this, I think the best default is probably for NewQuery to know the protocol for which it's creating a query and then we can use ID = 0 for DoH (something we were previously not doing) and DoQ. This is the complete diff I have locally for testing:

diff --git a/doquic.go b/doquic.go
index c683161..9e3f760 100644
--- a/doquic.go
+++ b/doquic.go
@@ -20,9 +20,8 @@ import (
 	"github.com/quic-go/quic-go"
 )
 
-func (t *Transport) createQUICStream(ctx context.Context, addr *ServerAddr,
-	query *dns.Msg) (stream *quicStreamWrapper, err error) {
-
+func (t *Transport) createQUICStream(
+	ctx context.Context, addr *ServerAddr) (stream *quicStreamWrapper, err error) {
 	udpAddr, err := net.ResolveUDPAddr("udp", addr.Address)
 	if err != nil {
 		return
@@ -56,12 +55,6 @@ func (t *Transport) createQUICStream(ctx context.Context, addr *ServerAddr,
 		_ = udpConn.SetDeadline(deadline)
 	}
 
-	// RFC 9250
-	// 4.2.1.  DNS Message IDs
-	// When sending queries over a QUIC connection, the DNS Message ID MUST
-	// be set to 0.
-	query.Id = 0
-
 	quicConn, err := tr.Dial(ctx, udpAddr, tlsConfig, quicConfig)
 	if err != nil {
 		return
@@ -89,7 +82,7 @@ func (t *Transport) queryQUIC(ctx context.Context, addr *ServerAddr, query *dns.
 	}
 
 	// Send the query and log the query if needed.
-	stream, err := t.createQUICStream(ctx, addr, query)
+	stream, err := t.createQUICStream(ctx, addr)
 	if err != nil {
 		return nil, err
 	}
diff --git a/dotcp.go b/dotcp.go
index 7d84afa..8d5a3c7 100644
--- a/dotcp.go
+++ b/dotcp.go
@@ -17,7 +17,6 @@ import (
 	"math"
 
 	"github.com/miekg/dns"
-	"github.com/quic-go/quic-go"
 )
 
 // queryTCP implements [*Transport.Query] for DNS over TCP.
@@ -101,7 +100,7 @@ func (t *Transport) queryStream(ctx context.Context,
 	// The client MUST send the DNS query over the selected stream and MUST
 	// indicate through the STREAM FIN mechanism that no further data will
 	// be sent on that stream.
-	if _, ok := conn.(quic.Stream); ok {
+	if _, ok := conn.(*quicStreamWrapper); ok {
 		_ = conn.Close()
 	}
 
diff --git a/go.mod b/go.mod
index 97eada1..7cfbcf9 100644
--- a/go.mod
+++ b/go.mod
@@ -4,6 +4,7 @@ go 1.23.3
 
 require (
 	github.com/miekg/dns v1.1.62
+	github.com/quic-go/quic-go v0.48.2
 	github.com/rbmk-project/common v0.16.0
 	github.com/stretchr/testify v1.10.0
 	golang.org/x/net v0.33.0
@@ -15,7 +16,6 @@ require (
 	github.com/google/pprof v0.0.0-20210407192527-94a9f03dee38 // indirect
 	github.com/onsi/ginkgo/v2 v2.9.5 // indirect
 	github.com/pmezard/go-difflib v1.0.0 // indirect
-	github.com/quic-go/quic-go v0.48.2 // indirect
 	go.uber.org/mock v0.4.0 // indirect
 	golang.org/x/crypto v0.31.0 // indirect
 	golang.org/x/exp v0.0.0-20240506185415-9bf2ced13842 // indirect
diff --git a/go.sum b/go.sum
index 0a8f2a8..e0b614a 100644
--- a/go.sum
+++ b/go.sum
@@ -4,8 +4,12 @@ github.com/chzyer/test v0.0.0-20180213035817-a1ea475d72b1/go.mod h1:Q3SI9o4m/ZMn
 github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
 github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
 github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
+github.com/go-logr/logr v1.2.4 h1:g01GSCwiDw2xSZfjJ2/T9M+S6pFdcNtFYsp+Y43HYDQ=
+github.com/go-logr/logr v1.2.4/go.mod h1:jdQByPbusPIv2/zmleS9BjJVeZ6kBagPoEUsqbVz/1A=
 github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 h1:tfuBGBXKqDEevZMzYi5KSi8KkcZtzBcTgAUUtapy0OI=
 github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572/go.mod h1:9Pwr4B2jHnOSGXyyzV8ROjYa2ojvAY6HCGYYfMoC3Ls=
+github.com/golang/protobuf v1.5.3 h1:KhyjKVUg7Usr/dYsdSqoFveMYd5ko72D+zANwlG1mmg=
+github.com/golang/protobuf v1.5.3/go.mod h1:XVQd3VNwM+JqD3oG2Ue2ip4fOMUkwXdXDdiuN0vRsmY=
 github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
 github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY=
 github.com/google/pprof v0.0.0-20210407192527-94a9f03dee38 h1:yAJXTCF9TqKcTiHJAE8dj7HMvPfh66eeA2JYW7eFpSE=
@@ -15,6 +19,8 @@ github.com/miekg/dns v1.1.62 h1:cN8OuEF1/x5Rq6Np+h1epln8OiyPWV+lROx9LxcGgIQ=
 github.com/miekg/dns v1.1.62/go.mod h1:mvDlcItzm+br7MToIKqkglaGhlFMHJ9DTNNWONWXbNQ=
 github.com/onsi/ginkgo/v2 v2.9.5 h1:+6Hr4uxzP4XIUyAkg61dWBw8lb/gc4/X5luuxN/EC+Q=
 github.com/onsi/ginkgo/v2 v2.9.5/go.mod h1:tvAoo1QUJwNEU2ITftXTpR7R1RbCzoZUOs3RonqW57k=
+github.com/onsi/gomega v1.27.6 h1:ENqfyGeS5AX/rlXDd/ETokDz93u0YufY1Pgxuy/PvWE=
+github.com/onsi/gomega v1.27.6/go.mod h1:PIQNjfQwkP3aQAH7lf7j87O/5FiNr+ZR8+ipb+qQlhg=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/quic-go/quic-go v0.48.2 h1:wsKXZPeGWpMpCGSWqOcqpW2wZYic/8T3aqiOID0/KWE=
@@ -42,8 +48,12 @@ golang.org/x/sys v0.28.0 h1:Fksou7UEQUWlKvIdsqzJmUmCX3cZuD2+P3XyyzwMhlA=
 golang.org/x/sys v0.28.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
 golang.org/x/text v0.21.0 h1:zyQAAkrwaneQ066sspRyJaG9VNi/YJ1NfzcGB3hZ/qo=
 golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
+golang.org/x/time v0.5.0 h1:o7cqy6amK/52YcAKIPlM3a+Fpj35zvRj2TP+e1xFSfk=
+golang.org/x/time v0.5.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
 golang.org/x/tools v0.28.0 h1:WuB6qZ4RPCQo5aP3WdKZS7i595EdWqWR8vqJTlwTVK8=
 golang.org/x/tools v0.28.0/go.mod h1:dcIOrVd3mfQKTgrDVQHqCPMWy6lnhfhtX3hLXYVLfRw=
+google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI=
+google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
 gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
 gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
diff --git a/internal/cmd/transport/main.go b/internal/cmd/transport/main.go
index 7d7c8af..24e5a59 100644
--- a/internal/cmd/transport/main.go
+++ b/internal/cmd/transport/main.go
@@ -52,7 +52,7 @@ func main() {
 	server := dnscore.NewServerAddr(dnscore.Protocol(*protocol), *serverAddr)
 	flags := 0
 	maxlength := uint16(dnscore.EDNS0SuggestedMaxResponseSizeUDP)
-	if *protocol == string(dnscore.ProtocolDoT) || *protocol == string(dnscore.ProtocolDoH) {
+	if *protocol == string(dnscore.ProtocolDoT) || *protocol == string(dnscore.ProtocolDoH) || *protocol == string(dnscore.ProtocolDoQ) {
 		flags |= dnscore.EDNS0FlagDO | dnscore.EDNS0FlagBlockLengthPadding
 	}
 	if *protocol != string(dnscore.ProtocolUDP) {
@@ -62,6 +62,12 @@ func main() {
 	// Create the DNS query
 	optEDNS0 := dnscore.QueryOptionEDNS0(maxlength, flags)
 	query := runtimex.Try1(dnscore.NewQuery(*domain, dnsType, optEDNS0))
+	// TODO(bassosimone): this is a temporary hack to ensure the query
+	// actually successfully validates. We should instead ensure that we
+	// specify the protocol for which we are creating the query.
+	if server.Protocol == dnscore.ProtocolDoH || server.Protocol == dnscore.ProtocolDoQ {
+		query.Id = 0
+	}
 	fmt.Printf(";; Query:\n%s\n", query.String())
 
 	// Perform the DNS query

With this diff applied:

% go run ./internal/cmd/transport -server dns.alidns.com:853
;; Query:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags: do; udp: 4096
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

{"time":"2025-02-15T14:59:35.91331+01:00","level":"INFO","msg":"dnsQuery","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","serverAddr":"dns.alidns.com:853","serverProtocol":"doq","t":"2025-02-15T14:59:35.913222+01:00","protocol":"udp"}
{"time":"2025-02-15T14:59:35.96982+01:00","level":"INFO","msg":"dnsResponse","localAddr":"[::]:51278","dnsRawQuery":"AAABAAABAAAAAAABB2V4YW1wbGUDY29tAAAcAAEAACkQAAAAgAAAWAAMAFQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","dnsRawResponse":"AACBAAABAAYAAAABB2V4YW1wbGUDY29tAAAcAAEHZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAa8AABTAAAAALgelMgHZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAa8AABTAAAAALgelM4HZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAjsAAA2AAAAABc2fyQHZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAjsAAA2AAAAABc2fzEHZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAY6AAAhAAAAABc+LmUHZXhhbXBsZQNjb20AABwAAQAAAAEAECYAFAY6AAAhAAAAABc+LmYAACkE0AAAAAAAZwAMAGMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=","remoteAddr":"223.5.5.5:853","serverAddr":"dns.alidns.com:853","serverProtocol":"doq","t0":"2025-02-15T14:59:35.913222+01:00","t":"2025-02-15T14:59:35.969812+01:00","protocol":"udp"}

;; Response:
;; opcode: QUERY, status: NOERROR, id: 0
;; flags: qr rd; QUERY: 1, ANSWER: 6, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version 0; flags:; udp: 1232
; PADDING: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

;; QUESTION SECTION:
;example.com.	IN	 AAAA

;; ANSWER SECTION:
example.com.	1	IN	AAAA	2600:1406:bc00:53::b81e:94c8
example.com.	1	IN	AAAA	2600:1406:bc00:53::b81e:94ce
example.com.	1	IN	AAAA	2600:1408:ec00:36::1736:7f24
example.com.	1	IN	AAAA	2600:1408:ec00:36::1736:7f31
example.com.	1	IN	AAAA	2600:1406:3a00:21::173e:2e65
example.com.	1	IN	AAAA	2600:1406:3a00:21::173e:2e66

So, with this change, I think the patch really starts to look good. I'll wait for your testing of the same domains and to hear if you have any design comment regarding what I mentioned above. If it's all green, then we can move towards thinking about actually merging this diff, which will boil down to, mostly, polishing the diff a bit, apply my changes or similar changes, and documenting what remains to be done next with TODO comments and possibly issues. More success emojis: 🥳 🎈

Copy link
Member

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for moving this forward! I have investigated the issue you requested me to investigate and tried to explain my current understanding. I think it's all looking good in this regard and we could soon move towards polishing and preparing for merging, which will mostly boil down to cosmetic changes, tweaks, and noting TODOs. Before this, please, make sure to double check my understanding and investigation and let me know if you have any specific comments regarding my analysis! 🙏

@roopeshsn
Copy link
Contributor Author

Hi @bassosimone! Thank you very much for spending your time on this! 🙏 You did a lot of testing and also gave me a lot of insights. Sure, I'll apply your patch and retest.

@bassosimone
Copy link
Member

Thank you, @roopeshsn! 🙏

@roopeshsn
Copy link
Contributor Author

roopeshsn commented Feb 16, 2025

	// TODO(bassosimone): this is a temporary hack to ensure the query
	// actually successfully validates. We should instead ensure that we
	// specify the protocol for which we are creating the query.
	if server.Protocol == dnscore.ProtocolDoH || server.Protocol == dnscore.ProtocolDoQ {
		query.Id = 0
	}

We can also pass protocol as an argument to NewQuery function and write an if logic inside it to set id as 0 for doq.

@roopeshsn
Copy link
Contributor Author

I retested the doq endpoints. As you said I am also getting the same errors as you mentioned for dandelionsprout.asuscomm.com:48582 and rx.techomespace.com:853.

For dandelionsprout.asuscomm.com, the ping didn't worked.

For rx.techomespace.com the ping worked. But when I ran the main program and took captures, I am able to see an ICMP error message back with type-3 and code-10. Probably some policies are blocking the traffic to port 853.

@roopeshsn
Copy link
Contributor Author

Cloudflare DNS doesn't support DoQ (cloudflare-dns.com:853) right? I tried to telnet with port 853 to check if that port is listening or not. But I didn't get any ICMP echo reply. Might be firewalls dropping the packets.

@bassosimone
Copy link
Member

I retested the doq endpoints. As you said I am also getting the same errors as you mentioned for dandelionsprout.asuscomm.com:48582 and rx.techomespace.com:853.

Thank you!

For dandelionsprout.asuscomm.com, the ping didn't worked.

The domain is now available again (the previous error I noticed had something to do with the authoritative DNS server not being able to serve the query, but this seems to have been resolved). That said, neither port 853 nor port 48582 seem to work with DoQ.

For rx.techomespace.com the ping worked. But when I ran the main program and took captures, I am able to see an ICMP error message back with type-3 and code-10. Probably some policies are blocking the traffic to port 853.

Interesting! In any case, I think we have already tested with enough servers.

Cloudflare DNS doesn't support DoQ (cloudflare-dns.com:853) right?

Yeah, I think it does not.

I tried to telnet with port 853 to check if that port is listening or not. But I didn't get any ICMP echo reply. Might be firewalls dropping the packets.

I am not sure this kind of test is conclusive, though. As far as I know, the telnet command only uses TCP. Because we're working with DoQ, which uses QUIC, which uses UDP, it's more tricky to know whether a port is open and listening. There is no three way handshake equivalent and, hence, I think the only way to know would be to attempt a QUIC handshake using such a port, which is kind of what happens when we try using the internal transport command with it, and we get a timeout that seems to suggest no-one is listening to such a port. In any case, using telnet on port 853/tcp reveals something interesting about whether there's support for DoT (which uses such a port). To do some dogfooding, I tried using rbmk nc to investigate whether such a port was available and I can confirm it is not available:

% rbmk nc --logs - cloudflare-dns.com 853
[...]
{"time":"2025-02-19T06:56:36.376955+01:00","level":"INFO","msg":"connectStart","protocol":"tcp","remoteAddr":"104.16.249.249:853","t":"2025-02-19T06:56:36.376951+01:00"}
{"time":"2025-02-19T06:57:51.375315+01:00","level":"INFO","msg":"connectDone","err":"dial tcp 104.16.249.249:853: connect: operation timed out","errClass":"ETIMEDOUT","localAddr":"","protocol":"tcp","remoteAddr":"104.16.249.249:853","t0":"2025-02-19T06:56:36.376951+01:00","t":"2025-02-19T06:57:51.375276+01:00"}
[...]

Anyways, I am marking this pull request as ready for review and I'm going to do a review and explain what is missing to get this diff merged, which mainly boils down to minor changes and adaptations.

Thank you!

@bassosimone bassosimone marked this pull request as ready for review February 19, 2025 06:03
bassosimone added a commit that referenced this pull request Feb 20, 2025
RFC9250 Sect. 4.2.1 says:

```
When sending queries over a QUIC connection, the DNS Message ID MUST
be set to 0.  The stream mapping for DoQ allows for unambiguous
correlation of queries and responses, so the Message ID field is not
required.

This has implications for proxying DoQ messages to and from other
transports.  For example, proxies may have to manage the fact that
DoQ can support a larger number of outstanding queries on a single
connection than, for example, DNS over TCP, because DoQ is not
limited by the Message ID space.  This issue already exists for DoH,
where a Message ID of 0 is recommended.
```

We noticed this issue in #18.

This diff aims at addressing the issue by adding support for
generating a protocol-specific query by default.

We do this by adding a new constructor: NewQueryWithServerAddr.

From the provided ServerAddr, we obtain the protocol, which, in
turn determines whether we should use a zero query ID.

The existing NewQuery protocol is deprecated and becomes a
wrapper around the new NewQueryWithServerAddr function.

Because we recognise the value of customising the actual query
ID beyond what the RFC says, we also introduce a new QueryOption
called QueryOptionID that allows setting an arbitrary ID.

We also update tests to ensure full coverage.

We also update the `internal` testing commands accordingly.
bassosimone added a commit that referenced this pull request Feb 20, 2025
RFC9250 Sect. 4.2.1 says:

```
When sending queries over a QUIC connection, the DNS Message ID MUST
be set to 0.  The stream mapping for DoQ allows for unambiguous
correlation of queries and responses, so the Message ID field is not
required.

This has implications for proxying DoQ messages to and from other
transports.  For example, proxies may have to manage the fact that
DoQ can support a larger number of outstanding queries on a single
connection than, for example, DNS over TCP, because DoQ is not
limited by the Message ID space.  This issue already exists for DoH,
where a Message ID of 0 is recommended.
```

RFC 8484 Sect. 4.1 says:

```
In order to maximize HTTP cache friendliness, DoH clients using media
formats that include the ID field from the DNS message header, such
as "application/dns-message", SHOULD use a DNS ID of 0 in every DNS
request.  HTTP correlates the request and response, thus eliminating
the need for the ID in a media type such as "application/dns-
message".  The use of a varying DNS ID can cause semantically
equivalent DNS queries to be cached separately.
```

We noticed this issue in #18,
where DoQ queries consistently failed with `dns.alidns.com` when
not using a zero DNS query ID.

This diff aims at addressing the issue by adding support for
generating a protocol-specific query by default.

We do this by adding a new constructor: NewQueryWithServerAddr.

From the provided ServerAddr, we obtain the protocol, which, in
turn determines whether we should use a zero query ID.

The existing NewQuery protocol is deprecated and becomes a
wrapper around the new NewQueryWithServerAddr function.

Because we recognise the value of customising the actual query
ID beyond what the RFC says, we also introduce a new QueryOption
called QueryOptionID that allows setting an arbitrary ID.

We also update tests to ensure full coverage.

We also update the `internal` testing commands accordingly.
bassosimone added a commit to rbmk-project/x that referenced this pull request Mar 1, 2025
Related to rbmk-project/dnscore#18, where we
introduce DoQ support.

While there, ensure `tlsconfig.go` has good test coverage.
bassosimone added a commit to rbmk-project/common that referenced this pull request Mar 1, 2025
TLSConn is required to write tests for the TLS handshake
inside the rbmk-project/x/netcore package.

PacketConn will be required to write tests for DoQ once
we've merged rbmk-project/dnscore#18.
bassosimone added a commit to rbmk-project/common that referenced this pull request Mar 1, 2025
TLSConn is required to write tests for the TLS handshake inside the
rbmk-project/x/netcore package.

PacketConn will be required to write tests for DoQ once we've merged
rbmk-project/dnscore#18.
@bassosimone
Copy link
Member

@roopeshsn we should also investigate why tests are failing 🤔

@roopeshsn
Copy link
Contributor Author

@roopeshsn we should also investigate why tests are failing 🤔

Noted.

@roopeshsn
Copy link
Contributor Author

I ran go test ./... to check all the test files. It seems they're fine.

roopesh:dnscore/ (feat-DoQ✗) $ go test ./...                                                                                                                                  [16:47:32]
ok      github.com/rbmk-project/dnscore 1.873s
ok      github.com/rbmk-project/dnscore/dnscoretest     1.732s
ok      github.com/rbmk-project/dnscore/internal/cmd/lookup     0.868s
ok      github.com/rbmk-project/dnscore/internal/cmd/mkcert     1.498s
ok      github.com/rbmk-project/dnscore/internal/cmd/transport  3.018s

Are you referring to the workflow or action checks in github?

@roopeshsn
Copy link
Contributor Author

If yes then from the build logs, it seems that we need to add go mod tidy command too.

@roopeshsn
Copy link
Contributor Author

I added the following test case for DoQ in query_test.go by modifying the DoH's test case.

{
			name:       "DoQ query should have zero ID",
			serverAddr: NewServerAddr(ProtocolQUIC, "dns.adguard-dns.com:853"),
			qname:      "example.com",
			qtype:      dns.TypeAAAA,
			wantName:   "example.com.",
			wantId:     0,
		},

Is this fine or do we need one more test case?

@roopeshsn
Copy link
Contributor Author

Apart from cosmetic changes we have below things left correct?

  • Refactor code (closing connections)
  • Test case for DoQ (doquic_test.go)

Feel free to add if you have any?

Question:
Do we need to add a method similar to DialTLSContext for QUIC too (DialQUICContext) in Transport struct?

type Transport struct {
...
	DialTLSContext func(ctx context.Context, network, address string) (net.Conn, error)
...

@bassosimone
Copy link
Member

bassosimone commented Mar 2, 2025

Dear @roopeshsn, thanks for following up with this!

Apart from cosmetic changes we have below things left correct?

  • Refactor code (closing connections)

Yes

  • Test case for DoQ (doquic_test.go)

Yes, in the sense of making sure the CI is green for a doq integration test.

Feel free to add if you have any?

Nothing comes to my mind. (As you mentioned, we have cosmetics but those are easy.)

Question: Do we need to add a method similar to DialTLSContext for QUIC too (DialQUICContext) in Transport struct?

type Transport struct {
...
	DialTLSContext func(ctx context.Context, network, address string) (net.Conn, error)
...

I could see how this could be beneficial but, at the same time, I think we are not going to need this for now, therefore, I think we can defer doing this to a slightly later time when we'll add testing. (My intention for moving forward is to add support for doq in the rbmk-project/rbmk code and then circling back to ensure higher coverage -- I think trying to cover the possible cases will tell us which extra functions we will need to ease writing tests.)

@bassosimone
Copy link
Member

bassosimone commented Mar 2, 2025

To clarify on this:

Yes, in the sense of making sure the CI is green for a doq integration test.

I think this PR has been running for quite some time and we don't need to add unit tests as part of it, even though, for sure, this is something we could be doing at a later time, as a follow up.

Also, the unit test case you added for the ID being zero looks fine.

@roopeshsn
Copy link
Contributor Author

All done! Let me know if we need to do any other changes.

@bassosimone
Copy link
Member

All done! Let me know if we need to do any other changes.

Thank you! I will take a look!

It seems my previous attempt at doing this slightly broke the
expectations of go1.24, so let's repair this.
@codecov
Copy link

codecov bot commented Mar 7, 2025

Codecov Report

Attention: Patch coverage is 78.21782% with 22 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
doquic.go 75.55% 16 Missing and 6 partials ⚠️
Files with missing lines Coverage Δ
dotcp.go 100.00% <100.00%> (ø)
internal/cmd/transport/main.go 100.00% <100.00%> (ø)
query.go 100.00% <100.00%> (ø)
serveraddr.go 100.00% <ø> (ø)
slog.go 100.00% <100.00%> (ø)
transport.go 100.00% <100.00%> (ø)
doquic.go 75.55% <75.55%> (ø)
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@bassosimone bassosimone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch now looks good to me! I have pushed some editorial changes on top of the branch, under the reasoning that it was mostly cosmetic changes. If you have time, please take a look at the overall diff to double check the end result. I am otherwise going to merge this over the weekend. Thank you for helping out with DoQ!!! it has been fun, it has been great, and I am happy that I have learned new things about the internet! 🥳 🥳 🥳 🥳

@bassosimone
Copy link
Member

We're going to merge this branch, which has been running on for quite some time, and which now looks good! There is obviously a coverage hit, because there are no unit tests. We will need to work on them on a follow up PR.

@bassosimone bassosimone merged commit 57e8800 into rbmk-project:main Mar 8, 2025
3 of 4 checks passed
bassosimone added a commit to rbmk-project/rbmk that referenced this pull request Jul 3, 2025
This diff is based on rbmk-project/dnscore@2bb1c48.

This diff is based on rbmk-project/common@443e41e.

This diff includes code written by @roopeshsn as part of
rbmk-project/dnscore#18.

The overall purpose for this diff is to simplify the development
environment and reduce chores for me. My original setup was OK for
a professionally maintained project, however, it is overkill for
something I maintain in my free time and as an hobby.
bassosimone added a commit to rbmk-project/rbmk that referenced this pull request Jul 3, 2025
The overall purpose for this diff is to simplify the development
environment and reduce chores for me.

My original setup was OK for a professionally maintained project.
However, it is overkill for a hobby project.

This diff is based on
rbmk-project/dnscore@2bb1c48.

This diff is based on
rbmk-project/common@443e41e.

This diff includes code written by @roopeshsn as part of
rbmk-project/dnscore#18.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants