# Content Delivery Networks ## DNS ### Definition: - Domain name system - Intended use: to translate domain name to IP addresses - Other uses: load distribution: replicated web server has many IPs, use DNS to redirect client to closest place - Distributed system, that servers are interconnected - Centralizing is hard, because of the huge traffic, and distance, and single point of failure - Many applications rely on DNS ### Hierarchy - Root DNS Server: Root name server - First point of contact - Directly query authoritative name server - Get Domain-name - IP mapping - Query for IP address for TLD DNS servers - TLD (Top Level Domain) `.com`, `.org`, `.edu` DNS server - Query for IP address to Authoritative DNS Server - Authoritative DNS Server: Owned by site owner like `amazon.com` ### Local DNS Server: - Actually a client, not in a part of the Hierarchy - Each ISP (Internet Service Provider) has one - Workings: - When host makes DNS query, it's sent to local DNS server - The Server may have local cache of name-to-address pair - Otherwise forward the query to the DNS hierarchy ### DNS Caching - Once the server knows about the mapping, it is **cached** - Cache entry timeout after time (TTL): on the other hand it may be out of date - TLD servers are typically cached in local, since root names are not frequently visited - Benefits - Reduce network traffic on: **Root servers**, **across the internet** - This increases network performance because DNS response is much faster. ## P2P ### Definition - A **Distributed** network architecture - Every node is both the **Client** and the **Server** - Advantages: - Scalable: - As the number of clients increase, the number of servers also increases - Both consume and donate resource - Less cost: Cost at the edge of network - More privacy: No centralized source of data - Reliability: - Distributed geographically - Has Replicas - No single point of failure - All of above made it easy to share content ### Categories - Unstructured: - No restriction on overlay structures and data placement - Examples: - Napster, BitTorrent, FreeNet - Structured - Uses Distributed Hash Table, that use an interface like `put(k, v)`, and `get(k)` - Has restriction on overlay structure, and data placement - Examples: - Chord, Pastery and CAN ### Server Selection - For BitTorrent, a Tracker is used, which informs the clients about the peers available - TODO: See diagram at page 26 ### Issues with P2P - Reliability - Performance - Control: have a lot of copyrighted content ## Content Delivery Networks ### History of Content Delivery - Web 1.0: Pre-CDN, Infrastructure development - CDN 1.0: First generation of CDN, replication, intelligent routing, edge computing - CDN 2.0: P2P, Cloud Computing, Energy Awareness - CDN 3.0: Autonomic composition ### Web Caches - The precursor to CDN - Improve efficiency by caching - Caching proxy: - Receive HTTP request from client - If object in cache, then send cached content - Otherwise request the object from origin server - Works as both client and server: - Client: request content from origin - Server: serve content to downstream client - Usually installed by ISP - Reason: - Reduce response time for client request - Reduce traffic across network - Problem: - Can't serve all of the web users, since the web is too large, and - Web content is dynamic and customized, which means many of them are not cacheable - Origin upstream web servers shouldn't rely on downstream caching proxy - Upstream web servers can't see the real statistics of their site, since the user data is not sent to their servers ### Definition - Also called _Content Distribution Network_ - **Infra**: large distributed system of servers deployed in multiple data centers across the internet - **Goal**: distribute content to end users on a large scale with high **availability** and high **performance** - Is a mechanism to **replicate** content on multiple servers on the internet, providing client a way to choose server that can provide content fast. - Content providers are the CDN customers: - They pay CDN companies to deliver their content - CDN pays ISPs, carriers, and network operators for hosting their servers - Usually used by large web platforms ### What CDN do - Serve a large fraction of internet content - Web objects (Text, JavaScript, graphics) - Downloadable objects - Applications - Stream media - Most of the web uses CDN ### The model - TODO: See the slide p41 ### CDN Deployment - CDN company deploy hundreds of servers around the world, often inside ISP networks, so that it's close to users - CDN Customer side: - Replicates customer's content in CDN servers - When provider update content, CDN update server with their content - User side: - Send request to origin server - Intercepted by redirection service - Forward user's request to best CDN server - Content served from CDN server ### Companies - Akamai - Limelight - ChinaCache - Edgecast ### Benefits - Reduce latency to users - Reduce load on original server - Increase security against Denial of Service Attacks - Scalability - Cheaper, easier to manage - Bypass traffic jams on the web: - Requested data is close to clients - Avoid bottleneck links ### Optimizations in CDN side - Content is cached at various locations, for faster access - Use data compression - Use load balancing to reduce traffic - Security features like DDoS protection - Use network peering, for shorter data paths ### Examples and Usage - Netflix: - Low latency and high defiition media can be played - Handles peak traffic - Content has consistent quality - Alibaba: - Rapid page loads for product listing - Support large scale events - Stability and scalability ### CDN Routing #### Server Selection - Load: To balance load - Performance: improve client performance, based on: - Geography - RTT - Throughput - Load - Any Node Alive: provide fault tolerance #### Ways of redirecting - As a part of routing: anycast (Single IP address is shared by many devices in multiple locations), cluster, load balancing - Pros: transparent to clients, works when browser cached failed addresses, circumvents many routing problems - Cons: Little control over selection of server, complex, scalability, and can't recover TCP - Part of application: HTTP Redirect - Pros: Application level, has more control - Cons: Has Additional load and RTT, and is hard to cache - Part of naming: DNS - Pros: Suitable for caching, dns redirect to any IP - Cons: This is implemented in resolver, requesting for a domain not URL, and hidden load factor for resolver's population - Can estimate the stats #### More on DNS redirection - DNS redirection is used to redirect client to a nearby server. - Based on: - Latency to client - Load balancing - Try to balance client across many servers to avoid hotspot - Available servers - Process: - Client's DNS request come to CDN's nameserver ( See below to how it's accessed. ) - DNS request is being resolved to a nearby server, by accessing CDN controlled name servers - CDN measures the state of network in the infrastructure - Two types of DNS redirection - Full: - the origin server is controlled by CDN - Pro: All requests are automatically redirected - Cons: May send a lot of traffic to CDN, so it's expensive - Partial: - Content provider mark what to provide to CDN - usually larger objects - Refer to images as `` - Accessing the website, CDN serve the data - Pros: Better control - Cons: Have to mark content ## Deployment ### Hosting your stuff - Where: rely on measures - Sample popular hostnames on alexa.com - Ask DNS from multiple vantage points - Categorize by type: - Hostnames - Files - Unpopular ### Examples - ChinaCache ## Future ### Challenges - Mobile networks: latency to cell is higher, opaque internal network structure - Video: Large bandwidth, - 16M - 30M bps compressed - When Combined can be 25K TBps - Even data centers don't have that much - Using multicast from end systems as potential solution ### CDN2.0 - Hybrid CDN: Akamai - Cloud Based Video: NetFlix - Meta CDN: Conviva - Virtual CDN: ISP micro-datacenters