When Tailscale Meets Alibaba Cloud: Why DNS Stops Working and How to Fix It
One afternoon, our small dev-ops team noticed that a production server on Alibaba Cloud ECS could no longer reach the public Internet—yet we could still SSH into it through Tailscale.
A quick run-through of the usual suspects—routing tables, security-group rules, even a reboot—did nothing.
After two hours of packet tracing, log spelunking, and mild panic, we discovered the root cause is surprisingly simple: the Alibaba Cloud DNS resolver happens to live inside the same IP range that Tailscale blocks by default.
Below you will find a complete, step-by-step account of what went wrong, why it happens, and five tested solutions that let you keep both Tailscale and Alibaba Cloud DNS working together—without hacks that break on the next update.
Table of contents
-
Symptoms: a server that loses the Internet -
First clues: DNS, not routing -
Deep dive: Tailscale’s hidden firewall rule -
The standards clash: RFC 6598 vs Alibaba Cloud -
Five practical fixes—ranked by risk -
How to test each fix in under 60 seconds -
Key takeaways
1. Symptoms: a server that loses the Internet
What we observed
-
SSH still works—but only via the Tailscale IP. -
apt update
,curl
,git fetch
, and any outbound HTTPS calls time out. -
Rebooting the ECS instance or restarting Tailscale brings connectivity back for roughly three minutes, then it dies again.
2. First clues: DNS, not routing
2.1 Ping proves IP works
$ ping 8.8.8.8
64 bytes from 8.8.8.8: icmp_seq=1 ttl=115 time=3.05 ms
The server can reach any public IP address, so routing is fine.
2.2 DNS lookup fails
Alibaba Cloud’s VPC uses 100.100.2.136 and 100.100.2.138 as the default internal DNS resolvers.
$ dig @100.100.2.136 example.com
;; connection timed out; no servers could be reached
3. Deep dive: Tailscale’s hidden firewall rule
3.1 Logging every dropped packet
We added a one-liner to log packets before they hit any rule:
sudo iptables -I INPUT 1 -j LOG --log-prefix "PKT_TRACE: "
Within seconds the log showed:
PKT_TRACE: IN=eth0 OUT= MAC=... SRC=100.100.2.136 DST=172.31.x.x PROTO=UDP DPT=53
3.2 The rule that blocks Alibaba DNS
$ sudo iptables -S | grep 100.64
-A ts-input -s 100.64.0.0/10 ! -i tailscale0 -j DROP
Tailscale inserts this rule automatically.
It means:
“Drop every packet whose source address is inside the Carrier-Grade NAT range (
100.64.0.0/10
) unless it arrived on the Tailscale interface.”
Because Alibaba’s DNS also lives in 100.64.0.0/10
, the reply packets from 100.100.2.136
are silently discarded.
4. The standards clash: RFC 6598 vs Alibaba Cloud
4.1 What RFC 6598 says
RFC 6598 reserves 100.64.0.0/10
for Carrier-Grade NAT (CGNAT)—a range that must never appear on the public Internet.
Tailscale treats the range as “Tailscale-only”, which is technically correct.
4.2 Why Alibaba Cloud uses the same range
-
Not routable on the Internet—so packets never leak. -
Does not overlap with classic private ranges ( 10.0.0.0/8
,192.168.0.0/16
). -
Keeps DNS close to the hypervisor, reducing latency.
The result: two perfectly valid decisions that collide on the same machine.
5. Five practical fixes—ranked by risk
# | Fix | One-line summary | Pros | Cons | When to use |
---|---|---|---|---|---|
1 | Delete the rule | Manually remove the DROP rule | Immediate relief | Rule returns on every Tailscale restart | Emergency debugging |
2 | Whitelist DNS IPs | Insert two ACCEPT rules above the DROP | Transparent to apps | Rule order may shift after updates | Single server, rare reboots |
3 | Automated watchdog | Cron or systemd keeps the whitelist alive | Hands-off | Extra moving part | Fleet of servers |
4 | Switch to public DNS | Point /etc/resolv.conf at 8.8.8.8 or 1.1.1.1 |
Zero iptables changes | Breaks Alibaba internal domains (OSS, RDS) | No internal cloud services |
5 | Disable Tailscale firewall | tailscale up --netfilter-mode=off |
Removes every Tailscale rule | Loses subnet routing, exit-node, ACLs | Tailscale for point-to-point only |
5.1 Fix 1: Delete the rule (one-off)
sudo iptables -D ts-input -s 100.64.0.0/10 ! -i tailscale0 -j DROP
Test:
dig @100.100.2.136 example.com
Caveat: Every time Tailscale restarts (systemctl restart tailscaled
or reboot), the rule is recreated.
5.2 Fix 2: Whitelist Alibaba DNS IPs
Add two ACCEPT rules before the Tailscale DROP:
sudo iptables -I ts-input 1 -s 100.100.2.136/32 -j ACCEPT
sudo iptables -I ts-input 1 -s 100.100.2.138/32 -j ACCEPT
Save the rules so they survive reboots:
# Ubuntu / Debian
sudo apt install iptables-persistent
sudo netfilter-persistent save
Edge case: Future Tailscale updates may reorder the chain, causing the ACCEPT rules to fall below the DROP.
Mitigation: combine with Fix 3.
5.3 Fix 3: Automated watchdog script
Create /usr/local/bin/fix-ts-dns.sh
:
#!/bin/bash
while true; do
if ! iptables -C ts-input -s 100.100.2.136/32 -j ACCEPT 2>/dev/null; then
iptables -I ts-input 1 -s 100.100.2.136/32 -j ACCEPT
fi
if ! iptables -C ts-input -s 100.100.2.138/32 -j ACCEPT 2>/dev/null; then
iptables -I ts-input 1 -s 100.100.2.138/32 -j ACCEPT
fi
sleep 30
done
Make it executable:
chmod +x /usr/local/bin/fix-ts-dns.sh
Wrap it in a systemd service so it starts after tailscaled
:
# /etc/systemd/system/fix-ts-dns.service
[Unit]
Description=Keep Alibaba DNS whitelisted in Tailscale chain
After=tailscaled.service
[Service]
Type=simple
ExecStart=/usr/local/bin/fix-ts-dns.sh
Restart=always
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl daemon-reload
sudo systemctl enable --now fix-ts-dns.service
5.4 Fix 4: Use public DNS resolvers
Replace the existing /etc/resolv.conf
:
sudo rm -f /etc/resolv.conf
echo "nameserver 8.8.8.8" | sudo tee /etc/resolv.conf
echo "nameserver 1.1.1.1" | sudo tee -a /etc/resolv.conf
If you use NetworkManager, set DNS via:
nmcli connection modify eth0 ipv4.dns "8.8.8.8 1.1.1.1"
nmcli connection up eth0
Trade-off:
-
All Alibaba internal endpoints (like bucket-name.oss-internal.aliyuncs.com
) will resolve to public IPs, potentially incurring bandwidth charges and higher latency. -
If your workload never touches Alibaba internal services, this is the simplest permanent fix.
5.5 Fix 5: Disable Tailscale’s netfilter mode
sudo tailscale up --netfilter-mode=off
This tells Tailscale not to touch iptables at all.
Verify:
sudo iptables -S | grep ts-
# Should print nothing
Consequences:
Feature | Status |
---|---|
Tailscale subnet routing | Broken |
Exit-node capability | Broken |
ACL enforcement | Broken |
Point-to-point WireGuard tunnel | Still works |
Use this only if you run Tailscale purely for machine-to-machine access and do not need subnet relay or exit-node features.
6. How to test each fix in under 60 seconds
-
Check DNS
time dig @100.100.2.136 example.com +short
A non-empty answer list means DNS is alive.
-
Check outbound HTTPS
curl -s -o /dev/null -w "%{http_code}\n" https://example.com
Should return
200
. -
Check Tailscale health
tailscale status tailscale ping some-other-node
-
Reboot test
sudo reboot
After the machine comes back, rerun steps 1-3 to ensure the fix is persistent.
7. Key takeaways
-
Root cause is not a bug in either Tailscale or Alibaba Cloud—both follow their own correct assumptions. -
Fastest relief: whitelist the two Alibaba DNS IPs in the Tailscale ts-input
chain. -
Cleanest long-term: if your project does not rely on Alibaba internal endpoints, switch to public DNS and forget the conflict ever existed. -
Automate the workaround with a tiny systemd service if you manage more than a handful of hosts.
If you try one of the fixes above—or invent a better one—let the community know.
Sharing the exact commands you used helps the next engineer spend less time debugging and more time building.