proxy / ubuntu

cara cache video youtube dengan squid + nginx di ubuntu

tulisan ini sebetulnya hanya copy paste dari blog aacable.

dunia per proxy an selalu berubah bahkan cenderung terlalu cepat, dulu saya ingat betul youtube masih mudah bgt untuk di cache di mesin proxy, namun belakangan youtube agak ‘pelit’ untuk membagikan videonya, mungkin ini karena adanya perlindungan hak cipta dalam video2 tersebut. namun yg menjadi masalah adalah user-user yg terus mencari cara untuk mendownload file-file video di youtube atau banyaknya request untuk akses ke youtube, teman-teman masih ingat kan kalo beberapa tahun terakhir kita sering di buat heboh oleh adanya video-video legendaris :), sebagai seorang pengelola jaringan biasanya kami dibuat pusing karena melonjaknya trafik ke site2 tertentu misalnya youtube ini, pasa user sebetulnya hanya melihat link video yg sama, anda bayangkan deh misal ada 150 an user melihat link video yg sama pada saat bersamaan pula? bisa-bisa bandwidth akan kesedot untuk keperluan melihat si video td, nah disinilah perlunya kita membangun sebuah proxy server, untuk menyimpan file2 internet yg pernah di akses user dalam kasus diatas adalah file-file video youtube 🙂

udah ah, langsung saja simak panduan langkah perlangkah untuk membangun proxy squid dilengkapi nginx untuk menyimpan file video youtube.

Advantages of Youtube Caching !!!

In most part of the world, bandwidth is very expensive, therefore it is (in some scenarios) very useful to Cache Youtube videos or any other flash videos, so if one of user downloads video / flash file , why again the same user or other user can’t download the same file from the CACHE, why he sucking the internet pipe for same content again n again?
Peoples on same LAN ,sometimes watch similar videos. If I put some youtube video link on on FACEBOOK, TWITTER or likewise , and all my friend will watch that video and that particular video gets viewed many times in few hours. Usually the videos are shared over facebook or other social networking sites so the chances are high for multiple hits per popular videos for my LAN users / friends.
This is the reason why I wrote this article.

Disadvantages of Youtube Caching !!!
The chances, that another user will watch the same video, is really slim. if I search for something specific on youtube, i get more then hundreds of search results for same video. What is the chance that another user will search for the same thing, and will click on the same link / result? Youtube hosts more than 10 million videos. Which is too much to cache anyway. You need lot of space to cache videos. Also accordingly you will be needing ultra modern fast hardware with tons of SPACE to handle such kind of cache giant. anyhow Try it
AFAIK you are not supposed to cache youtube videos, youtube don’t like it. I don’t understand why. Probably because their ranking mechanism relies on views, and possibly completed views, which wouldn’t be measurable if the content was served from a local cache.
After unsuccessful struggling with storeurl.pl method , I was searching for alternate method to cache youtube videos. Finally I found ruby base method using Nginx to cache YT. Using this method I was able to cache all Youtube videos almost perfectly. (not 100%, but it works fine in most cases with some modification.I am sure there will be some improvement in near future).

Thanks to Mr. Eliezer Croitoru & Mr.Christian Loth & others for there kind guidance.
Following components were used in this guide.
Proxy Server Configuration:
Ubuntu Desktop 10.4
Nginix version: nginx/0.7.65
Squid Cache: Version 2.7.STABLE7
Client Configuration for testing videos:
Windows XP with Internet Explorer 6
Windows 7 with Internet Explorer 8
Lets start with the Proxy Server Configuration:

1) Update Ubuntu
First install Ubuntu, After installation, configure its networking components, then update it by following command
apt-get install update
2) Install SSH Server [Optional]
Now install SSH server so that you can manage your server remotely using PUTTY or any other ssh tool.
apt-get install openssh-server
3) Install Squid Server
Now install Squid Server by following command
apt-get install squid
[This will install squid 2.7 by default]
Now edit squid configuration files by using following command
nano /etc/squid/squid.conf
Remove all lines and paste the following data
# SQUID 2.7/ Nginx TEST CONFIG FILE
# Email: aacable@hotmail.com
# Web : http://aacable.wordpress.com
# PORT and Transparent Option
http_port 8080 transparent
server_http11 on
icp_port 0

# Cache is set to 5GB in this example (zaib)
store_dir_select_algorithm round-robin
cache_dir aufs /cache1 5000 16 256
cache_replacement_policy heap LFUDA
memory_replacement_policy heap LFUDA

# If you want to enable DATE time n SQUID Logs,use following
emulate_httpd_log on
logformat squid %tl %6tr %>a %Ss/%03Hs %<st %rm %ru %un %Sh/%<A %mt
log_fqdn off

# How much days to keep users access web logs
# You need to rotate your log files with a cron job. For example:
# 0 0 * * * /usr/local/squid/bin/squid -k rotate
logfile_rotate 14
debug_options ALL,1
cache_access_log /var/log/squid/access.log
cache_log /var/log/squid/cache.log
cache_store_log /var/log/squid/store.log

#[zaib] I used DNSAMSQ service for fast dns resolving
#so install by using “apt-get install dnsmasq” first
dns_nameservers 127.0.0.1 221.132.112.8

#ACL Section
acl all src 0.0.0.0/0.0.0.0
acl manager proto cache_object
acl localhost src 127.0.0.1/255.255.255.255
acl to_localhost dst 127.0.0.0/8
acl SSL_ports port 443 563 # https, snews
acl SSL_ports port 873 # rsync
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 563 # https, snews
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl Safe_ports port 631 # cups
acl Safe_ports port 873 # rsync
acl Safe_ports port 901 # SWAT
acl purge method PURGE
acl CONNECT method CONNECT
http_access allow manager localhost
http_access deny manager
http_access allow purge localhost
http_access deny purge
http_access deny !Safe_ports
http_access deny CONNECT !SSL_ports
http_access allow localhost
http_access allow all
http_reply_access allow all
icp_access allow all

#[zaib]I used UBUNTU so user is proxy, in FEDORA you may use use squid
cache_effective_user proxy
cache_effective_group proxy
cache_mgr aacable@hotmail.com
visible_hostname proxy.aacable.net
unique_hostname aacable@hotmail.com

cache_mem 8 MB
minimum_object_size 0 bytes
maximum_object_size 100 MB
maximum_object_size_in_memory 128 KB

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern (Release|Packages(.gz)*)$ 0 20% 2880
refresh_pattern . 0 50% 4320
acl apache rep_header Server ^Apache
broken_vary_encoding allow apache

# Youtube Cache Section [zaib]
url_rewrite_program /etc/nginx/nginx.rb
url_rewrite_host_header off
acl youtube_videos url_regex -i ^http://[^/]+\.youtube\.com/videoplayback\?
acl range_request req_header Range .
acl begin_param url_regex -i [?&]begin=
acl id_param url_regex -i [?&]id=
acl itag_param url_regex -i [?&]itag=
acl sver3_param url_regex -i [?&]sver=3
cache_peer 127.0.0.1 parent 8081 0 proxy-only no-query connect-timeout=10
cache_peer_access 127.0.0.1 allow youtube_videos id_param itag_param sver3_param !begin_param !range_request
cache_peer_access 127.0.0.1 deny all
Save & Exit.
4) Install Nginx
Now install Nginix by
apt-get install nginx
Now edit its config file by using following command
nano /etc/nginx/nginx.conf
Remove all lines and paste the following data
# This config file is not written by me,
# My Email address is inserted Just for tracking purposes
# For more info, visit http://code.google.com/p/youtube-cache/
# Syed Jahanzaib / aacable [at] hotmail.com
user www-data;
worker_processes 4;
pid /var/run/nginx.pid;
events {
worker_connections 768;
}
http {
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
include /etc/nginx/mime.types;
default_type application/octet-stream;
access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;
gzip on;
gzip_static on;
gzip_comp_level 6;
gzip_disable .msie6.;
gzip_vary on;
gzip_types text/plain text/css text/xml text/javascript application/json application/x-javascript application/xml application/xml+rss;
gzip_proxied expired no-cache no-store private auth;
gzip_buffers 16 8k;
gzip_http_version 1.1;
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
# starting youtube section
server {
listen 127.0.0.1:8081;
location / {
root /usr/local/www/nginx_cache/files;
#try_files “/id=$arg_id.itag=$arg_itag” @proxy_youtube; # Old one
#try_files “$uri” “/id=$arg_id.itag=$arg_itag.flv” “/id=$arg_id-range=$arg_range.itag=$arg_itag.flv” @proxy_youtube; #old2
try_files “/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm” @proxy_youtube;
}
location @proxy_youtube {
resolver 8.8.8.8;
proxy_pass http://$host$request_uri;
proxy_temp_path “/usr/local/www/nginx_cache/tmp”;

#untuk mengakali youtube file range 🙂
#proxy_store “/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag”; # Old 1
proxy_store “/usr/local/www/nginx_cache/files/id=$arg_id.itag=$arg_itag.range=$arg_range.algo=$arg_algorithm”;
proxy_ignore_client_abort off;
proxy_method GET;
proxy_set_header X-YouTube-Cache “mattcurly@yahoo.com”;
proxy_set_header Accept “video/*”;
proxy_set_header User-Agent “YouTube Cacher (nginx)”;
proxy_set_header Accept-Encoding “”;
proxy_set_header Accept-Language “”;
proxy_set_header Accept-Charset “”;
proxy_set_header Cache-Control “”;}
}
}
Save & Exit.

Now Create directories to hold cache files

mkdir /usr/local/www
mkdir /usr/local/www/nginx_cache
mkdir /usr/local/www/nginx_cache/tmp
mkdir /usr/local/www/nginx_cache/files
chown www-data /usr/local/www/nginx_cache/files/ -Rf

Now create nginx .rb file

touch /etc/nginx/nginx.rb
chmod 755 /etc/nginx/nginx.rb
nano /etc/nginx/nginx.rb

Paste the following data in this newly created file

#!/usr/bin/env ruby1.8
# This script is not written by me,
# My Email address is inserted Just for tracking purposes
# For more info, visit http://code.google.com/p/youtube-cache/
# Syed Jahanzaib / aacable [at] hotmail.com
# url_rewrite_program <path>/nginx.rb
# url_rewrite_host_header off

require “syslog”
require “base64”

class SquidRequest
attr_accessor :url, :user
attr_reader :client_ip, :method

def method=(s)
@method = s.downcase
end

def client_ip=(s)
@client_ip = s.split(‘/’).first
end
end

def read_requests
# URL <SP> client_ip “/” fqdn <SP> user <SP> method [<SP> kvpairs]<NL>
STDIN.each_line do |ln|
r = SquidRequest.new
r.url, r.client_ip, r.user, r.method, *dummy = ln.rstrip.split(‘ ‘)
(STDOUT << “#{yield r}\n”).flush
end
end

def log(msg)
Syslog.log(Syslog::LOG_ERR, “%s”, msg)
end

def main
Syslog.open(‘nginx.rb’, Syslog::LOG_PID)
log(“Started”)

read_requests do |r|
if r.method == ‘get’ && r.url !~ /[?&]begin=/ && r.url =~ %r{\Ahttp://[^/]+\.youtube\.com/(videoplayback\?.*)\z}
log(“YouTube Video [#{r.url}].”)
http://127.0.0.1:8081/#{$1}”
else
r.url
end
end
end
main
Save & Exit.
5) Install RUBY
What is RUBY?
Ruby is a dynamic, open source programming language with a focus on simplicity and productivity. It has an elegant syntax that is natural to read and easy to write.
Now install RUBY by following command
apt-get install ruby
6) Configure Squid Cache DIR and Permissions
Now create cache dir and assign proper permission to proxy user
mkdir /cache1
chown proxy:proxy /cache1
chmod -R 777 /cache1
Now initialize squid cache directories by
squid -z

You should see Following message
Creating Swap Directories
7) Finally Start/restart SQUID & Nginx
service squid start
service nginx restart

Now from test pc, open youtube and play any video, after it download completely, delete the browser cache, and play the same video again, This time it will be served from the cache. You can verify it by monitoring your WAN link utilization while playing the cached file.
Look at the below WAN utilization graph, it was taken while watching the clip which is not in cache

WAN utilization of Proxy, While watching New Clip (Not in cache)
Now Look at the below WAN utilization graph, it was taken while watching the clip which is now in CACHE.

WAN utilization of Proxy, While watching already cached Clip

Playing Video, loaded from the cache chunk by chunk
It will load first chunk from the cache, if the user keep watching the clip, it will load next chunk at the end of first chunk, and will continue to do so.

Video cache files can be found in following locations.
/usr/local/www/nginx_cache/files
e.g:
ls -lh /usr/local/www/nginx_cache/files

The above file shows the clip is in 360p quality, and the length of the clip is 5:54 Seconds.
itag=34 shows the video quality is 360p.
Credits: Thanks to Mr. Eliezer Croitoru & Mr.Christian Loth & others for there kind guidance.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s