8.6 KiB
8.6 KiB
Content Delivery Networks
DNS
Definition:
- Domain name system
- Intended use: to translate domain name to IP addresses
- Other uses: load distribution: replicated web server has many IPs, use DNS to redirect client to closest place
- Distributed system, that servers are interconnected
- Centralizing is hard, because of the huge traffic, and distance, and single point of failure
- Many applications rely on DNS
Hierarchy
- Root DNS Server: Root name server
- First point of contact
- Directly query authoritative name server
- Get Domain-name - IP mapping
- Query for IP address for TLD DNS servers
- TLD (Top Level Domain)
.com
,.org
,.edu
DNS server- Query for IP address to Authoritative DNS Server
- Authoritative DNS Server: Owned by site owner like
amazon.com
Local DNS Server:
- Actually a client, not in a part of the Hierarchy
- Each ISP (Internet Service Provider) has one
- Workings:
- When host makes DNS query, it's sent to local DNS server
- The Server may have local cache of name-to-address pair
- Otherwise forward the query to the DNS hierarchy
DNS Caching
- Once the server knows about the mapping, it is cached
- Cache entry timeout after time (TTL): on the other hand it may be out of date
- TLD servers are typically cached in local, since root names are not frequently visited
- Benefits
- Reduce network traffic on: Root servers, across the internet
- This increases network performance because DNS response is much faster.
P2P
Definition
- A Distributed network architecture
- Every node is both the Client and the Server
- Advantages:
- Scalable:
- As the number of clients increase, the number of servers also increases
- Both consume and donate resource
- Less cost: Cost at the edge of network
- More privacy: No centralized source of data
- Reliability:
- Distributed geographically
- Has Replicas
- No single point of failure
- All of above made it easy to share content
- Scalable:
Categories
- Unstructured:
- No restriction on overlay structures and data placement
- Examples:
- Napster, BitTorrent, FreeNet
- Structured
- Uses Distributed Hash Table, that use an interface like
put(k, v)
, andget(k)
- Has restriction on overlay structure, and data placement
- Examples:
- Chord, Pastery and CAN
- Uses Distributed Hash Table, that use an interface like
Server Selection
- For BitTorrent, a Tracker is used, which informs the clients about the peers
available
- TODO: See diagram at page 26
Issues with P2P
- Reliability
- Performance
- Control: have a lot of copyrighted content
Content Delivery Networks
History of Content Delivery
- Web 1.0: Pre-CDN, Infrastructure development
- CDN 1.0: First generation of CDN, replication, intelligent routing, edge computing
- CDN 2.0: P2P, Cloud Computing, Energy Awareness
- CDN 3.0: Autonomic composition
Web Caches
- The precursor to CDN
- Improve efficiency by caching
- Caching proxy:
- Receive HTTP request from client
- If object in cache, then send cached content
- Otherwise request the object from origin server
- Works as both client and server:
- Client: request content from origin
- Server: serve content to downstream client
- Usually installed by ISP
- Reason:
- Reduce response time for client request
- Reduce traffic across network
- Problem:
- Can't serve all of the web users, since the web is too large, and
- Web content is dynamic and customized, which means many of them are not cacheable
- Origin upstream web servers shouldn't rely on downstream caching proxy
- Upstream web servers can't see the real statistics of their site, since the user data is not sent to their servers
Definition
- Also called Content Distribution Network
- Infra: large distributed system of servers deployed in multiple data centers across the internet
- Goal: distribute content to end users on a large scale with high availability and high performance
- Is a mechanism to replicate content on multiple servers on the internet, providing client a way to choose server that can provide content fast.
- Content providers are the CDN customers:
- They pay CDN companies to deliver their content
- CDN pays ISPs, carriers, and network operators for hosting their servers
- Usually used by large web platforms
What CDN do
- Serve a large fraction of internet content
- Web objects (Text, JavaScript, graphics)
- Downloadable objects
- Applications
- Stream media
- Most of the web uses CDN
The model
- TODO: See the slide p41
CDN Deployment
- CDN company deploy hundreds of servers around the world, often inside ISP networks, so that it's close to users
- CDN Customer side:
- Replicates customer's content in CDN servers
- When provider update content, CDN update server with their content
- User side:
- Send request to origin server
- Intercepted by redirection service
- Forward user's request to best CDN server
- Content served from CDN server
Companies
- Akamai
- Limelight
- ChinaCache
- Edgecast
Benefits
- Reduce latency to users
- Reduce load on original server
- Increase security against Denial of Service Attacks
- Scalability
- Cheaper, easier to manage
- Bypass traffic jams on the web:
- Requested data is close to clients
- Avoid bottleneck links
Optimizations in CDN side
- Content is cached at various locations, for faster access
- Use data compression
- Use load balancing to reduce traffic
- Security features like DDoS protection
- Use network peering, for shorter data paths
Examples and Usage
- Netflix:
- Low latency and high defiition media can be played
- Handles peak traffic
- Content has consistent quality
- Alibaba:
- Rapid page loads for product listing
- Support large scale events
- Stability and scalability
CDN Routing
Server Selection
- Load: To balance load
- Performance: improve client performance, based on:
- Geography
- RTT
- Throughput
- Load
- Any Node Alive: provide fault tolerance
Ways of redirecting
- As a part of routing: anycast (Single IP address is shared by many devices in
multiple locations), cluster, load balancing
- Pros: transparent to clients, works when browser cached failed addresses, circumvents many routing problems
- Cons: Little control over selection of server, complex, scalability, and can't recover TCP
- Part of application: HTTP Redirect
- Pros: Application level, has more control
- Cons: Has Additional load and RTT, and is hard to cache
- Part of naming: DNS
- Pros: Suitable for caching, dns redirect to any IP
- Cons: This is implemented in resolver, requesting for a domain not URL,
and hidden load factor for resolver's population
- Can estimate the stats
More on DNS redirection
- DNS redirection is used to redirect client to a nearby server.
- Based on:
- Latency to client
- Load balancing
- Try to balance client across many servers to avoid hotspot
- Available servers
- Process:
- Client's DNS request come to CDN's nameserver ( See below to how it's accessed. )
- DNS request is being resolved to a nearby server, by accessing CDN controlled name servers
- CDN measures the state of network in the infrastructure
- Two types of DNS redirection
- Full:
- the origin server is controlled by CDN
- Pro: All requests are automatically redirected
- Cons: May send a lot of traffic to CDN, so it's expensive
- Partial:
- Content provider mark what to provide to CDN
- usually larger objects
- Refer to images as
<img src=http://cdn.com/foo/bar/img.gif>
- Accessing the website, CDN serve the data
- Pros: Better control
- Cons: Have to mark content
- Full:
Deployment
Hosting your stuff
- Where: rely on measures
- Sample popular hostnames on alexa.com
- Ask DNS from multiple vantage points
- Categorize by type:
- Hostnames
- Files
- Unpopular
Examples
- ChinaCache
Future
Challenges
- Mobile networks: latency to cell is higher, opaque internal network structure
- Video: Large bandwidth,
- 16M - 30M bps compressed
- When Combined can be 25K TBps
- Even data centers don't have that much
- Using multicast from end systems as potential solution
CDN2.0
- Hybrid CDN: Akamai
- Cloud Based Video: NetFlix
- Meta CDN: Conviva
- Virtual CDN: ISP micro-datacenters