EBU6502_cloud_computing_notes/4-2-cdn.md
2024-12-31 20:24:24 +08:00

8.6 KiB

Content Delivery Networks

DNS

Definition:

  • Domain name system
  • Intended use: to translate domain name to IP addresses
  • Other uses: load distribution: replicated web server has many IPs, use DNS to redirect client to closest place
  • Distributed system, that servers are interconnected
    • Centralizing is hard, because of the huge traffic, and distance, and single point of failure
  • Many applications rely on DNS

Hierarchy

  • Root DNS Server: Root name server
    • First point of contact
    • Directly query authoritative name server
    • Get Domain-name - IP mapping
    • Query for IP address for TLD DNS servers
  • TLD (Top Level Domain) .com, .org, .edu DNS server
    • Query for IP address to Authoritative DNS Server
  • Authoritative DNS Server: Owned by site owner like amazon.com

Local DNS Server:

  • Actually a client, not in a part of the Hierarchy
  • Each ISP (Internet Service Provider) has one
  • Workings:
    • When host makes DNS query, it's sent to local DNS server
    • The Server may have local cache of name-to-address pair
    • Otherwise forward the query to the DNS hierarchy

DNS Caching

  • Once the server knows about the mapping, it is cached
  • Cache entry timeout after time (TTL): on the other hand it may be out of date
  • TLD servers are typically cached in local, since root names are not frequently visited
  • Benefits
    • Reduce network traffic on: Root servers, across the internet
    • This increases network performance because DNS response is much faster.

P2P

Definition

  • A Distributed network architecture
  • Every node is both the Client and the Server
  • Advantages:
    • Scalable:
      • As the number of clients increase, the number of servers also increases
      • Both consume and donate resource
    • Less cost: Cost at the edge of network
    • More privacy: No centralized source of data
    • Reliability:
      • Distributed geographically
      • Has Replicas
      • No single point of failure
    • All of above made it easy to share content

Categories

  • Unstructured:
    • No restriction on overlay structures and data placement
    • Examples:
      • Napster, BitTorrent, FreeNet
  • Structured
    • Uses Distributed Hash Table, that use an interface like put(k, v), and get(k)
    • Has restriction on overlay structure, and data placement
    • Examples:
      • Chord, Pastery and CAN

Server Selection

  • For BitTorrent, a Tracker is used, which informs the clients about the peers available
    • TODO: See diagram at page 26

Issues with P2P

  • Reliability
  • Performance
  • Control: have a lot of copyrighted content

Content Delivery Networks

History of Content Delivery

  • Web 1.0: Pre-CDN, Infrastructure development
  • CDN 1.0: First generation of CDN, replication, intelligent routing, edge computing
  • CDN 2.0: P2P, Cloud Computing, Energy Awareness
  • CDN 3.0: Autonomic composition

Web Caches

  • The precursor to CDN
  • Improve efficiency by caching
  • Caching proxy:
    • Receive HTTP request from client
    • If object in cache, then send cached content
    • Otherwise request the object from origin server
  • Works as both client and server:
    • Client: request content from origin
    • Server: serve content to downstream client
  • Usually installed by ISP
  • Reason:
    • Reduce response time for client request
    • Reduce traffic across network
  • Problem:
    • Can't serve all of the web users, since the web is too large, and
    • Web content is dynamic and customized, which means many of them are not cacheable
    • Origin upstream web servers shouldn't rely on downstream caching proxy
    • Upstream web servers can't see the real statistics of their site, since the user data is not sent to their servers

Definition

  • Also called Content Distribution Network
  • Infra: large distributed system of servers deployed in multiple data centers across the internet
  • Goal: distribute content to end users on a large scale with high availability and high performance
  • Is a mechanism to replicate content on multiple servers on the internet, providing client a way to choose server that can provide content fast.
  • Content providers are the CDN customers:
    • They pay CDN companies to deliver their content
    • CDN pays ISPs, carriers, and network operators for hosting their servers
  • Usually used by large web platforms

What CDN do

  • Serve a large fraction of internet content
    • Web objects (Text, JavaScript, graphics)
    • Downloadable objects
    • Applications
    • Stream media
  • Most of the web uses CDN

The model

  • TODO: See the slide p41

CDN Deployment

  • CDN company deploy hundreds of servers around the world, often inside ISP networks, so that it's close to users
  • CDN Customer side:
    • Replicates customer's content in CDN servers
    • When provider update content, CDN update server with their content
  • User side:
    • Send request to origin server
    • Intercepted by redirection service
    • Forward user's request to best CDN server
    • Content served from CDN server

Companies

  • Akamai
  • Limelight
  • ChinaCache
  • Edgecast

Benefits

  • Reduce latency to users
  • Reduce load on original server
    • Increase security against Denial of Service Attacks
  • Scalability
  • Cheaper, easier to manage
  • Bypass traffic jams on the web:
    • Requested data is close to clients
    • Avoid bottleneck links

Optimizations in CDN side

  • Content is cached at various locations, for faster access
  • Use data compression
  • Use load balancing to reduce traffic
  • Security features like DDoS protection
  • Use network peering, for shorter data paths

Examples and Usage

  • Netflix:
    • Low latency and high defiition media can be played
    • Handles peak traffic
    • Content has consistent quality
  • Alibaba:
    • Rapid page loads for product listing
    • Support large scale events
    • Stability and scalability

CDN Routing

Server Selection

  • Load: To balance load
  • Performance: improve client performance, based on:
    • Geography
    • RTT
    • Throughput
    • Load
  • Any Node Alive: provide fault tolerance

Ways of redirecting

  • As a part of routing: anycast (Single IP address is shared by many devices in multiple locations), cluster, load balancing
    • Pros: transparent to clients, works when browser cached failed addresses, circumvents many routing problems
    • Cons: Little control over selection of server, complex, scalability, and can't recover TCP
  • Part of application: HTTP Redirect
    • Pros: Application level, has more control
    • Cons: Has Additional load and RTT, and is hard to cache
  • Part of naming: DNS
    • Pros: Suitable for caching, dns redirect to any IP
    • Cons: This is implemented in resolver, requesting for a domain not URL, and hidden load factor for resolver's population
      • Can estimate the stats

More on DNS redirection

  • DNS redirection is used to redirect client to a nearby server.
  • Based on:
    • Latency to client
    • Load balancing
      • Try to balance client across many servers to avoid hotspot
    • Available servers
  • Process:
    • Client's DNS request come to CDN's nameserver ( See below to how it's accessed. )
    • DNS request is being resolved to a nearby server, by accessing CDN controlled name servers
    • CDN measures the state of network in the infrastructure
  • Two types of DNS redirection
    • Full:
      • the origin server is controlled by CDN
      • Pro: All requests are automatically redirected
      • Cons: May send a lot of traffic to CDN, so it's expensive
    • Partial:
      • Content provider mark what to provide to CDN
      • usually larger objects
      • Refer to images as <img src=http://cdn.com/foo/bar/img.gif>
      • Accessing the website, CDN serve the data
      • Pros: Better control
      • Cons: Have to mark content

Deployment

Hosting your stuff

  • Where: rely on measures
    • Sample popular hostnames on alexa.com
    • Ask DNS from multiple vantage points
    • Categorize by type:
      • Hostnames
      • Files
      • Unpopular

Examples

  • ChinaCache

Future

Challenges

  • Mobile networks: latency to cell is higher, opaque internal network structure
  • Video: Large bandwidth,
    • 16M - 30M bps compressed
    • When Combined can be 25K TBps
    • Even data centers don't have that much
    • Using multicast from end systems as potential solution

CDN2.0

  • Hybrid CDN: Akamai
  • Cloud Based Video: NetFlix
  • Meta CDN: Conviva
  • Virtual CDN: ISP micro-datacenters