289 lines
8.6 KiB
Markdown
289 lines
8.6 KiB
Markdown
|
# Content Delivery Networks
|
||
|
|
||
|
## DNS
|
||
|
|
||
|
### Definition:
|
||
|
|
||
|
- Domain name system
|
||
|
- Intended use: to translate domain name to IP addresses
|
||
|
- Other uses: load distribution: replicated web server has many IPs, use DNS to
|
||
|
redirect client to closest place
|
||
|
- Distributed system, that servers are interconnected
|
||
|
- Centralizing is hard, because of the huge traffic, and distance, and
|
||
|
single point of failure
|
||
|
- Many applications rely on DNS
|
||
|
|
||
|
### Hierarchy
|
||
|
|
||
|
- Root DNS Server: Root name server
|
||
|
- First point of contact
|
||
|
- Directly query authoritative name server
|
||
|
- Get Domain-name - IP mapping
|
||
|
- Query for IP address for TLD DNS servers
|
||
|
- TLD (Top Level Domain) `.com`, `.org`, `.edu` DNS server
|
||
|
- Query for IP address to Authoritative DNS Server
|
||
|
- Authoritative DNS Server: Owned by site owner like `amazon.com`
|
||
|
|
||
|
### Local DNS Server:
|
||
|
|
||
|
- Actually a client, not in a part of the Hierarchy
|
||
|
- Each ISP (Internet Service Provider) has one
|
||
|
- Workings:
|
||
|
- When host makes DNS query, it's sent to local DNS server
|
||
|
- The Server may have local cache of name-to-address pair
|
||
|
- Otherwise forward the query to the DNS hierarchy
|
||
|
|
||
|
### DNS Caching
|
||
|
|
||
|
- Once the server knows about the mapping, it is **cached**
|
||
|
- Cache entry timeout after time (TTL): on the other hand it may be out of date
|
||
|
- TLD servers are typically cached in local, since root names are not frequently
|
||
|
visited
|
||
|
- Benefits
|
||
|
- Reduce network traffic on: **Root servers**, **across the internet**
|
||
|
- This increases network performance because DNS response is much faster.
|
||
|
|
||
|
## P2P
|
||
|
|
||
|
### Definition
|
||
|
|
||
|
- A **Distributed** network architecture
|
||
|
- Every node is both the **Client** and the **Server**
|
||
|
- Advantages:
|
||
|
- Scalable:
|
||
|
- As the number of clients increase, the number of servers also
|
||
|
increases
|
||
|
- Both consume and donate resource
|
||
|
- Less cost: Cost at the edge of network
|
||
|
- More privacy: No centralized source of data
|
||
|
- Reliability:
|
||
|
- Distributed geographically
|
||
|
- Has Replicas
|
||
|
- No single point of failure
|
||
|
- All of above made it easy to share content
|
||
|
|
||
|
### Categories
|
||
|
|
||
|
- Unstructured:
|
||
|
- No restriction on overlay structures and data placement
|
||
|
- Examples:
|
||
|
- Napster, BitTorrent, FreeNet
|
||
|
- Structured
|
||
|
- Uses Distributed Hash Table, that use an interface like `put(k, v)`, and
|
||
|
`get(k)`
|
||
|
- Has restriction on overlay structure, and data placement
|
||
|
- Examples:
|
||
|
- Chord, Pastery and CAN
|
||
|
|
||
|
### Server Selection
|
||
|
|
||
|
- For BitTorrent, a Tracker is used, which informs the clients about the peers
|
||
|
available
|
||
|
- TODO: See diagram at page 26
|
||
|
|
||
|
### Issues with P2P
|
||
|
|
||
|
- Reliability
|
||
|
- Performance
|
||
|
- Control: have a lot of copyrighted content
|
||
|
|
||
|
## Content Delivery Networks
|
||
|
|
||
|
### History of Content Delivery
|
||
|
|
||
|
- Web 1.0: Pre-CDN, Infrastructure development
|
||
|
- CDN 1.0: First generation of CDN, replication, intelligent routing, edge
|
||
|
computing
|
||
|
- CDN 2.0: P2P, Cloud Computing, Energy Awareness
|
||
|
- CDN 3.0: Autonomic composition
|
||
|
|
||
|
### Web Caches
|
||
|
|
||
|
- The precursor to CDN
|
||
|
- Improve efficiency by caching
|
||
|
- Caching proxy:
|
||
|
- Receive HTTP request from client
|
||
|
- If object in cache, then send cached content
|
||
|
- Otherwise request the object from origin server
|
||
|
- Works as both client and server:
|
||
|
- Client: request content from origin
|
||
|
- Server: serve content to downstream client
|
||
|
- Usually installed by ISP
|
||
|
- Reason:
|
||
|
- Reduce response time for client request
|
||
|
- Reduce traffic across network
|
||
|
- Problem:
|
||
|
- Can't serve all of the web users, since the web is too large, and
|
||
|
- Web content is dynamic and customized, which means many of them are not
|
||
|
cacheable
|
||
|
- Origin upstream web servers shouldn't rely on downstream caching proxy
|
||
|
- Upstream web servers can't see the real statistics of their site, since
|
||
|
the user data is not sent to their servers
|
||
|
|
||
|
### Definition
|
||
|
|
||
|
- Also called _Content Distribution Network_
|
||
|
- **Infra**: large distributed system of servers deployed in multiple data
|
||
|
centers across the internet
|
||
|
- **Goal**: distribute content to end users on a large scale with high
|
||
|
**availability** and high **performance**
|
||
|
- Is a mechanism to **replicate** content on multiple servers on the internet,
|
||
|
providing client a way to choose server that can provide content fast.
|
||
|
- Content providers are the CDN customers:
|
||
|
- They pay CDN companies to deliver their content
|
||
|
- CDN pays ISPs, carriers, and network operators for hosting their servers
|
||
|
- Usually used by large web platforms
|
||
|
|
||
|
### What CDN do
|
||
|
|
||
|
- Serve a large fraction of internet content
|
||
|
- Web objects (Text, JavaScript, graphics)
|
||
|
- Downloadable objects
|
||
|
- Applications
|
||
|
- Stream media
|
||
|
- Most of the web uses CDN
|
||
|
|
||
|
### The model
|
||
|
|
||
|
- TODO: See the slide p41
|
||
|
|
||
|
### CDN Deployment
|
||
|
|
||
|
- CDN company deploy hundreds of servers around the world, often inside ISP
|
||
|
networks, so that it's close to users
|
||
|
- CDN Customer side:
|
||
|
- Replicates customer's content in CDN servers
|
||
|
- When provider update content, CDN update server with their content
|
||
|
- User side:
|
||
|
- Send request to origin server
|
||
|
- Intercepted by redirection service
|
||
|
- Forward user's request to best CDN server
|
||
|
- Content served from CDN server
|
||
|
|
||
|
### Companies
|
||
|
|
||
|
- Akamai
|
||
|
- Limelight
|
||
|
- ChinaCache
|
||
|
- Edgecast
|
||
|
|
||
|
### Benefits
|
||
|
|
||
|
- Reduce latency to users
|
||
|
- Reduce load on original server
|
||
|
- Increase security against Denial of Service Attacks
|
||
|
- Scalability
|
||
|
- Cheaper, easier to manage
|
||
|
- Bypass traffic jams on the web:
|
||
|
- Requested data is close to clients
|
||
|
- Avoid bottleneck links
|
||
|
|
||
|
### Optimizations in CDN side
|
||
|
|
||
|
- Content is cached at various locations, for faster access
|
||
|
- Use data compression
|
||
|
- Use load balancing to reduce traffic
|
||
|
- Security features like DDoS protection
|
||
|
- Use network peering, for shorter data paths
|
||
|
|
||
|
### Examples and Usage
|
||
|
|
||
|
- Netflix:
|
||
|
- Low latency and high defiition media can be played
|
||
|
- Handles peak traffic
|
||
|
- Content has consistent quality
|
||
|
- Alibaba:
|
||
|
- Rapid page loads for product listing
|
||
|
- Support large scale events
|
||
|
- Stability and scalability
|
||
|
|
||
|
### CDN Routing
|
||
|
|
||
|
#### Server Selection
|
||
|
|
||
|
- Load: To balance load
|
||
|
- Performance: improve client performance, based on:
|
||
|
- Geography
|
||
|
- RTT
|
||
|
- Throughput
|
||
|
- Load
|
||
|
- Any Node Alive: provide fault tolerance
|
||
|
|
||
|
#### Ways of redirecting
|
||
|
|
||
|
- As a part of routing: anycast (Single IP address is shared by many devices in
|
||
|
multiple locations), cluster, load balancing
|
||
|
- Pros: transparent to clients, works when browser cached failed addresses,
|
||
|
circumvents many routing problems
|
||
|
- Cons: Little control over selection of server, complex, scalability, and
|
||
|
can't recover TCP
|
||
|
- Part of application: HTTP Redirect
|
||
|
- Pros: Application level, has more control
|
||
|
- Cons: Has Additional load and RTT, and is hard to cache
|
||
|
- Part of naming: DNS
|
||
|
- Pros: Suitable for caching, dns redirect to any IP
|
||
|
- Cons: This is implemented in resolver, requesting for a domain not URL,
|
||
|
and hidden load factor for resolver's population
|
||
|
- Can estimate the stats
|
||
|
|
||
|
#### More on DNS redirection
|
||
|
|
||
|
- DNS redirection is used to redirect client to a nearby server.
|
||
|
- Based on:
|
||
|
- Latency to client
|
||
|
- Load balancing
|
||
|
- Try to balance client across many servers to avoid hotspot
|
||
|
- Available servers
|
||
|
- Process:
|
||
|
- Client's DNS request come to CDN's nameserver ( See below to how it's
|
||
|
accessed. )
|
||
|
- DNS request is being resolved to a nearby server, by accessing CDN
|
||
|
controlled name servers
|
||
|
- CDN measures the state of network in the infrastructure
|
||
|
- Two types of DNS redirection
|
||
|
- Full:
|
||
|
- the origin server is controlled by CDN
|
||
|
- Pro: All requests are automatically redirected
|
||
|
- Cons: May send a lot of traffic to CDN, so it's expensive
|
||
|
- Partial:
|
||
|
- Content provider mark what to provide to CDN
|
||
|
- usually larger objects
|
||
|
- Refer to images as `<img src=http://cdn.com/foo/bar/img.gif>`
|
||
|
- Accessing the website, CDN serve the data
|
||
|
- Pros: Better control
|
||
|
- Cons: Have to mark content
|
||
|
|
||
|
## Deployment
|
||
|
|
||
|
### Hosting your stuff
|
||
|
|
||
|
- Where: rely on measures
|
||
|
- Sample popular hostnames on alexa.com
|
||
|
- Ask DNS from multiple vantage points
|
||
|
- Categorize by type:
|
||
|
- Hostnames
|
||
|
- Files
|
||
|
- Unpopular
|
||
|
|
||
|
### Examples
|
||
|
|
||
|
- ChinaCache
|
||
|
|
||
|
## Future
|
||
|
|
||
|
### Challenges
|
||
|
|
||
|
- Mobile networks: latency to cell is higher, opaque internal network structure
|
||
|
- Video: Large bandwidth,
|
||
|
- 16M - 30M bps compressed
|
||
|
- When Combined can be 25K TBps
|
||
|
- Even data centers don't have that much
|
||
|
- Using multicast from end systems as potential solution
|
||
|
|
||
|
### CDN2.0
|
||
|
|
||
|
- Hybrid CDN: Akamai
|
||
|
- Cloud Based Video: NetFlix
|
||
|
- Meta CDN: Conviva
|
||
|
- Virtual CDN: ISP micro-datacenters
|