AZ-104 · Skill Domain

Network Architecture

Hub-spoke VNet topology, NSG design, Standard Load Balancer, private endpoints with split-horizon DNS, Application Gateway with WAF, and Azure Bastion — production network patterns built and debugged from scratch using Azure CLI.

4Projects
8Subnets
0Public IPs on VMs
P3–P13Phase 2 + 4
01

Projects

03
VNet · NSG · Peering
Hub-Spoke Network Topology
Two VNets · bidirectional peering · subnet-level NSGs · no public IPs on VMs

Built a hub-spoke VNet topology with two networks connected via bidirectional peering. The hub holds shared services — Bastion, DNS — while each spoke is an independent blast radius. All VMs deployed with no public IPs; access via az vm run-command for automation and Bastion for interactive sessions. NSGs attached at subnet level, not NIC level, so all future resources in the subnet inherit the same rules automatically.

VNet peering is not a single connection — it requires two separate objects: Hub→Spoke AND Spoke→Hub. A single-direction peering shows peeringState: Initiated, not Connected. Traffic does not flow until both are created, both with allowForwardedTraffic: true.

⚠ Gotcha — ip route show is not a valid peering diagnostic

Azure SDN handles peered VNet routes at the hypervisor layer. The guest OS routing table will never show peering routes — they will always appear missing even when peering is fully functional. Always test with actual connectivity: ping a real VM IP across the peering, never the subnet gateway.

Topology diagram — project3-network-topology.svg NSG rules & traffic flows — project3-nsg-rules.svg
  • No public IPs on VMs--public-ip-address "" at creation time. Eliminates the attack surface entirely. All access routes through Bastion (interactive) or run-command (automation).
  • NSG at subnet level, not NIC levelaz network vnet subnet create --nsg attaches the NSG to the subnet. Protects all resources in the subnet including future additions. Single management point vs. per-VM NIC rules.
  • Service tags over IP rangesAzureLoadBalancer, VirtualNetwork, Internet rather than hardcoded IP ranges. Microsoft maintains service tags automatically — hardcoded IPs break when Azure infrastructure changes.
  • Internet vs * in NSG deny rulesInternet blocks public traffic only. * blocks everything including Bastion connections through the VNet. Using * to deny RDP/SSH would silently block Bastion.
Virtual Networks VNet Peering NSG Service Tags Azure Bastion Azure CLI
Hub to spoke peering
Hub → Spoke peering
Spoke to hub peering
Spoke → Hub peering
NSG web rules
NSG web subnet rules
NSG DB rules
NSG DB subnet rules
05
Load Balancer · SNAT · Availability
Standard Load Balancer
Standard SKU · health probes · explicit outbound SNAT · availability set · round-robin distribution

Deployed a Standard SKU Load Balancer distributing HTTP traffic across two Ubuntu VMs running nginx, each in separate fault domains via an availability set. The Standard SKU required explicit configuration of things Basic SKU provided implicitly — health probe NSG rules and outbound SNAT. Four separate issues were encountered and resolved during this project, each genuinely educational.

  • Browsers silently upgrade bare IPs to HTTPS — Chrome and Edge default to HTTPS for all addresses. The LB only had port 80 configured, so the attempt timed out and looked like an LB misconfiguration. Always type http:// explicitly when testing HTTP-only backends.
  • Health probes blocked by NSG — Without a rule allowing the AzureLoadBalancer service tag on port 80, probes are silently dropped. Both VMs show as Unhealthy and the LB drops all traffic. Basic SKU allowed this implicitly; Standard does not.
  • No implicit outbound SNAT — VMs in the backend pool with no public IP had zero internet access. apt install nginx appeared to succeed but silently failed. systemctl status nginx revealed the service didn't exist. Fix: second public IP, frontend IP config, outbound rule.
  • Frontend IP config is an intermediate layer — Outbound rules reference frontend IP configurations, not public IPs directly. The dependency chain is: Public IP → Frontend IP Config → Rule. Portal abstracts this; CLI does not.
Verify round-robin distribution
for i in {1..20}; do curl -s http://<LB-IP> | grep "Served by"; done | sort | uniq -c
View architecture diagram — project5-lb-architecture.svg
  • Standard SKU — no exceptions — Basic SKU Load Balancer is being retired. Standard is zone-redundant, supports outbound rules and multiple frontend IPs. Every new deployment uses Standard.
  • Availability set for fault domain separation — Two VMs in separate fault domains means a single rack failure doesn't take both offline. Minimum viable HA for a two-VM pool.
  • Dedicated outbound public IP — Separate from the inbound frontend IP. Keeps outbound SNAT traffic on a distinct IP, simplifying firewall rules and preserving the inbound IP for load-balanced traffic only.
  • Round-robin is not strictly alternating — Standard LB uses a 5-tuple hash. Browser HTTP keep-alive reuses TCP connections, so the same VM serves multiple requests in a row. Expect uneven distribution (e.g., 13/7), not exactly 10/10.
⚠ Gotcha — apt install can silently fail

Without outbound SNAT, apt install nginx printed connection warnings but reported success — exit code 0. Never trust package manager exit status alone on a newly provisioned VM. Always verify with systemctl status nginx.

Standard Load Balancer Health Probes Backend Pools Outbound SNAT Frontend IP Config Availability Sets NSG Service Tags
LB rule and health probe
LB rule + health probe
LB availability set
Availability set + round-robin
NSG blocking LB probes
NSG blocking LB probes
VM unhealthy in LB
VM unhealthy — backend pool
06
DNS · Private Endpoints · Zero Trust
Private Endpoints & Split-Horizon DNS
Private Link · Private DNS zone · VNet DNS links · public access disabled · same hostname, different resolution

Removed Azure Blob Storage from the public internet without changing application hostname — using a private endpoint and split-horizon DNS. VMs inside the VNet resolve the storage hostname to a private IP (10.0.1.6) via the private DNS zone. Public internet clients receive a 404 WebContentNotFound response — not a 403, by design. Azure Storage does not acknowledge the endpoint exists when public access is disabled. A 403 would reveal the resource is there. A 404 reveals nothing.

Before — Public Resolution
stlabeastus001.blob.core.windows.net → resolves to public IP → traverses public internet → hits storage firewall
After — Private Resolution
stlabeastus001.blob.core.windows.net → private DNS zone override → resolves to 10.0.1.6 → stays inside VNet ✓
Source Resolves to Result
vm-web-01 (vnet-spoke) 10.0.1.6 ✓ Private access
vm-hub-test (vnet-hub) 10.0.1.6 ✓ Private access
Public internet Public IP (overridden) ✗ 404 — resource not acknowledged
View architecture diagram — project6-dns-private-endpoint.svg
  • Private DNS zone linked to ALL VNets — Linking only to vnet-hub means vnet-spoke VMs query Azure DNS without the private zone override and receive the public IP. DNS resolution and network routing are independent — peering establishes the route; VNet DNS zone links determine which DNS response is returned. Both must be configured.
  • Private DNS zone names are fixed — The zone name for each Private Link service is mandated by Azure. Any other name prevents automatic A record creation. Blob Storage must be privatelink.blob.core.windows.net.
  • Disable public access only after private connectivity confirmed — Confirm-then-lockdown: deploy endpoint → verify private IP resolution via nslookup → disable public access → verify public is blocked. Re-enable before teardown — CLI delete commands from Cloud Shell cannot reach storage with public access disabled.
  • 404 is the stronger security posture — When public network access is disabled, Azure Storage returns 404, not 403. A 403 would reveal the resource exists. A 404 reveals nothing. This is intentional Azure behavior, not a misconfiguration.
⚠ Gotcha — Private DNS zone name is not configurable

Common mistake: creating the private DNS zone with a custom name. Azure requires the exact mandated name per service type or the private endpoint NIC's A record will not be auto-registered. Always verify the zone name matches the Private Link DNS zone table before deployment.

Private Endpoints Private Link Private DNS Zones DNS Split-Horizon VNet DNS Links Storage Firewall Zero-Trust Networking
DNS nslookup private resolution
nslookup — private IP resolution
Private DNS zone recordsets
Private DNS zone — A record
13
App Gateway · WAF · Bastion
Application Gateway & Bastion
Standard_v2 · WAF policy · dedicated subnet · AzureBastionSubnet · inline NSG rules

Deployed Azure Application Gateway (Standard_v2) as an L7 load balancer with WAF policy in front of the web subnet, and Azure Bastion for secure browser-based VM access with no public IPs required. Both services have strict subnet requirements — Application Gateway requires a dedicated subnet with three specific NSG rules or provisioning fails; Bastion requires a subnet named exactly AzureBastionSubnet with inline NSG rules.

  • GatewayManager → ports 65200–65535 (priority 100) — Control plane traffic from Azure infrastructure. Without this, the gateway fails health checks and will not provision.
  • AzureLoadBalancer → any (priority 200) — Azure internal health probing. Required for Standard_v2.
  • Internet → ports 80, 443 (priority 300) — Inbound client traffic. Restrict source to known ranges in production.
Bastion — inline NSG rules are required

NSG rules for AzureBastionSubnet must be defined inside properties.securityRules: [...] on the NSG resource, not as separate child resources. Separate child resources deploy in parallel and may not be complete when the compliance check runs at subnet attachment time.

⚠ Gotcha — AzureBastionSubnet exact name required

The subnet must be named exactly AzureBastionSubnet — no CAF naming, no variations. Minimum size /26. Any deviation and Bastion will not provision regardless of all other configuration being correct.

  • App Gateway requires a dedicated subnet — Cannot share a subnet with any other resource type. Minimum /27 for lab, /24 for production. This is a hard platform requirement, not a recommendation.
  • deployBastionHost condition flag — Bastion Standard SKU costs ~$0.19/hr (~$140/mo). A deployBastionHost bool parameter in Bicep allows the NSG, subnet, and all rules to deploy in place while the host itself is only created when needed. Cost control pattern for lab environments.
  • Bastion target VM NSG rule required — Bastion can connect to the VM but the VM's NSG must allow inbound SSH (22) or RDP (3389) from the AzureBastionSubnet range (10.0.7.0/26). No clear error is surfaced — the connection simply fails silently.
  • WAF over NSG for L7 protection — NSGs operate at L3/L4 — they can block IPs and ports but cannot inspect HTTP payloads. WAF policy on Application Gateway provides OWASP ruleset enforcement, SQL injection protection, and custom rules at the application layer.
Application Gateway WAF Policy Azure Bastion Standard_v2 NSG Service Tags Bicep
NSG rules issue
NSG rules — AppGW failure
NSG blocking RDP
NSG blocking RDP — Bastion path
02

Lessons Learned

DNS and routing are independent layers
VNet peering establishes the network route. Private DNS zone links determine what IP a hostname resolves to. Configuring one without the other produces confusing failures — traffic can route correctly but DNS returns the wrong IP, or DNS resolves correctly but there's no route to the private IP. Both must be verified independently.
Standard SKU requires explicit configuration
Basic SKU Load Balancer provided implicit outbound SNAT and allowed health probe traffic through NSGs without rules. Standard SKU does neither — everything must be configured explicitly. This is correct behavior: implicit allowances obscure what's actually happening on your network. Microsoft is retiring Basic SKU; all new deployments use Standard.
Subnet gateway IPs never respond to ping
The .1 address in each subnet (e.g., 10.0.1.1) is an Azure infrastructure address that never responds to ICMP. A common first debugging step is pinging the gateway to test connectivity — it will always time out even when everything is working. Always test peering by pinging a VM's private IP.
Bastion NSG rules must be inline
Defining Bastion NSG rules as separate child resources triggers a race condition — the compliance check at subnet attachment time may run before all rules are deployed. Rules inside securityRules: [...] on the NSG resource deploy atomically and pass the compliance check reliably every time.
Verify installation, don't trust exit codes
apt install can exit 0 while silently failing to download packages when there's no outbound internet access. The Standard LB SNAT issue was only discovered via systemctl status nginx — the install appeared successful. Always verify service state after package installation on a newly provisioned VM.
404 is stronger than 403 for locked storage
When Azure Storage public access is disabled, it returns 404, not 403. A 403 Unauthorized tells an attacker the resource exists and they lack permission. A 404 reveals nothing. This is intentional Azure behavior — understanding the difference between "access denied" and "does not exist" matters when hardening PaaS services.