PHPのプロファイリング

Gitからソースを取得

git clone https://github.com/facebook/xhprof.git

コンパイル

cd xhprof/extension/
phpize
./configure
make
sudo make install

モジュールの追加

sudo vi /etc/php5/conf.d/xhprof.ini
; configuration for php XHprof module
extension=xhprof.so
xhprof.output_dir="/var/www/xhprof.sheeps.me/log"

コールグラフ生成用ツールのインストール

sudo aptitude -y install graphviz

Webインターフェースの設定

cp -f xhprof/xhprof_html/ /var/www/xhprof

xhprof_htmlの中身を閲覧できるように設定

XHProfの実行サンプル
[php]
<?php
function bar($x) {
if ($x > 0) {
bar($x – 1);
}
}

function foo() {
for ($idx = 0; $idx < 5; $idx++) {
bar($idx);
$x = strlen("abc");
}
}

// start profiling
xhprof_enable();

// run program
foo();

// stop profiler
$xhprof_data = xhprof_disable();

// display raw xhprof data for the profiler run
print_r($xhprof_data);

$XHPROF_ROOT = realpath(dirname(__FILE__) .’/..’);
include_once $XHPROF_ROOT . "/xhprof_lib/utils/xhprof_lib.php";
include_once $XHPROF_ROOT . "/xhprof_lib/utils/xhprof_runs.php";

// save raw data for this profiler run using default
// implementation of iXHProfRuns.
$xhprof_runs = new XHProfRuns_Default();

// save the run under a namespace "xhprof_foo"
$run_id = $xhprof_runs->save_run($xhprof_data, "xhprof_foo");

echo "—————\n".
"Assuming you have set up the http based UI for \n".
"XHProf at some address, you can view run at \n".
"http://<xhprof-ui-address>/index.php?run=$run_id&source=xhprof_foo\n".
"—————\n";
?>
[/php]

Posted in php.

MySQLのインポート・エクスポート

テーブルごとにダンプ

mysqldump --quick \
          --single-transaction \
          --add-locks \
          --no-autocommit \
          --user=mysql \
          --password=passwd \
          --host=localhost \
          --default-character-set=utf8 \
          smpldb \
          smpltbl > ./tbldump.sql

CSVファイル出力

SELECT * FROM `smpltbl` INTO OUTFILE "/var/tmp/smpltbl.csv" FIELDS TERMINATED BY ',';

CSVファイル入力

LOAD DATA LOCAL INFILE "/var/tmp/smpltbl.csv" INTO TABLE `smpltbl` FIELDS TERMINATED BY ',';

バイナリログの変換

mysqlbinlog /var/lib/mysql/groonga.log > /tmp/groonga.sql

JAVA開発支援ツール

Mavenのインストール

sudo aptitude -y install maven2

環境変数の設定

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CLASSPATH=".:/usr/lib/jvm/java-6-sun/lib" 

Mavenへライブラリのインストール

Hadoop関連

export HADOOP_HOME=/usr/lib/hadoop-0.20
export HBASE_HOME=/usr/lib/hbase


mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hadoop-core -Dversion=1.2.1 -Dpackaging=jar -Dfile=${HADOOP_HOME}/hadoop-core.jar
mvn install:install-file -DgroupId=org.apache.zookeeper -DartifactId=zookeeper -Dversion=3.4.2 -Dpackaging=jar -Dfile=${HBASE_HOME}/lib/zookeeper.jar
mvn install:install-file -DgroupId=org.apache.hadoop -DartifactId=hbase -Dversion=0.90.6 -Dpackaging=jar -Dfile=${HBASE_HOME}/hbase.jar

Sunライブラリ関連
Download the version 1.1 API Documentation, Jar and Source
Java Management Extension (JMX) 1.2.1

mvn install:install-file -DgroupId=javax.jms -DartifactId=jms -Dversion=1.1 -Dpackaging=jar -Dfile=/usr/lib/jvm/java-6-sun/lib/jms.jar
mvn install:install-file -DgroupId=com.sun.jmx -DartifactId=jmxri -Dversion=1.2.1 -Dpackaging=jar -Dfile=/usr/lib/jvm/java-6-sun/lib/jmxri.jar
mvn install:install-file -DgroupId=com.sun.jdmk -DartifactId=jmxtools -Dversion=1.2.1 -Dpackaging=jar -Dfile=/usr/lib/jvm/java-6-sun/lib/jmxtools.jar

Mavenリポジトリ

Mavenでの開発

プロジェクトの作成

mkdir projects
cd projects

mvn archetype:create -DgroupId=me.sheeps.hdfs -DartifactId=sample
cd sample

生成されるファイル

./sample
    src/
        main/
            java/
                me/
                    sheeps/
                        hdfs/
                            App.java
        test/
            java/
                me/
                    sheeps/
                        hdfs/
                            AppTest.java
    pom.xml

pom.xmlに利用するライブラリを追加

vi pom.xml

[xml]
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>me.sheeps.hdfs</groupId>
<artifactId>sample</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>

<name>sample</name>
<url>http://maven.apache.org</url>

<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF-8</encoding>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<manifest>
<mainClass>me.sheeps.hdfs.App</mainClass>
<packageName>me.sheeps.hdfs</packageName>
<addClasspath>true</addClasspath>
<addExtensions>true</addExtensions>
<classpathPrefix>lib</classpathPrefix>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>

<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>

<dependency>
<groupId>com.sun.jmx</groupId>
<artifactId>jmxri</artifactId>
<version>1.2.1</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>0.20</version>
</dependency>

<dependency>
<groupId>org.apache.zookeeper</groupId>
<artifactId>zookeeper</artifactId>
<version>3.4.2</version>
</dependency>

<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hbase</artifactId>
<version>0.90.6</version>
</dependency>

<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.1</version>
</dependency>

</dependencies>
</project>
[/xml]

hBaseのサンプル

[java]
package me.sheeps.hdfs;

import java.io.IOException;
import java.lang.System;
import java.util.NavigableMap;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.GenericOptionsParser;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FirstKeyOnlyFilter;
import org.apache.hadoop.hbase.filter.BinaryPrefixComparator;
import org.apache.hadoop.hbase.filter.CompareFilter.CompareOp;
import org.apache.hadoop.hbase.filter.DependentColumnFilter;
import org.apache.hadoop.hbase.util.Bytes;

/**
* Appクラス
*
* @package Sample
* @author Yujiro Takahashi <yujiro3@gmail.com>
*/
public class App {
/**
* メイン
*
* @access public
* @param String[] args
* @return void
*/
public static void main(String[] args) throws Exception {
// 設定情報の読み込み
Configuration conf = HBaseConfiguration.create();
conf.addResource("/etc/hbase/conf/hbase-site.xml");
conf.set("hbase.client.scanner.caching", "3");

// 引数のパース
new GenericOptionsParser(conf, args);

HTable table = new HTable(conf, Bytes.toBytes("accesslog")); // テーブルを指定

// Scan条件の指定
Scan scan = new Scan();
Filter filter = new DependentColumnFilter(
Bytes.toBytes("user"), // カラムファミリを指定
Bytes.toBytes("id"), // 修飾子を指定
false,
CompareOp.EQUAL,
new BinaryPrefixComparator(Bytes.toBytes(args[0]))
);
scan.setFilter(filter);
ResultScanner scanner = table.getScanner(scan);

System.out.println("/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/_/");

for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
String row = Bytes.toString(rr.getValue(Bytes.toBytes(args[1]), Bytes.toBytes(args[2])));
System.out.println(row);
}
}
}
[/java]

コンパイル

mvn compile

パッケージ化

mvn clean package

実行

hadoop jar ./target/sample-1.0-SNAPSHOT.jar 1258878 log timestamp

PEARパッケージ管理のPyrusをインストール

sqlite3が必要になります

sudo aptitude -y install php5-sqlite

PyrusのPHARファイルをダウンロード

mkdir ~/.pear
cd ~/.pear/
wget http://pear2.php.net/pyrus.phar

Pyrusの初期設定

php pyrus.phar install

php pyrus.phar mypear ~/.pear/vendor
php pyrus.phar set bin_dir ~/.pear/vendor/bin

pyrusコマンド登録

vi ~/.pear/pyrus
#!/usr/bin/env bash

/usr/bin/env php -q ~/.pyrus/pyrus.phar $@
chmod +x ~/.pear/pyrus

Aliasでpyrusコマンド登録

alias pyrus='php -q ~/.pear/pyrus.phar'

ヘルプの表示

pyrus --help
Pyrus, the PHP manager

Usage:
  php pyrus.phar [/path/to/pyrus] [options]
  php pyrus.phar [/path/to/pyrus] [options]  [options] [args]

Options:
  -v, --verbose   increase verbosity
  -p, --paranoid  set or increase paranoia level
  -h, --help      show this help message and exit
  --version       show the program version and exit

Commands:
  install             Install a package.  Use install --plugin to install
                      plugins
  upgrade             Upgrade a package.  Use upgrade --plugin to upgrade
                      plugins
  uninstall           Uninstall a package.  Use uninstall --plugin to
                      uninstall plugins
  info                Display information about a package
  build               Build a PHP extension package from source and install
                      the compiled extension
  list-upgrades       List packages with upgrades available
  remote-list         List all remote packages in a channel, organized by
                      category
  download            Download a remote package to the current directory
  list-packages       List all installed packages in all channels
  list-channels       List all discovered channels
  channel-discover    Discover a new channel
  channel-del         Remove a channel from the registry
  upgrade-registry    Upgrade an old PEAR installation to the new registry
                      format
  run-scripts         Run all post-install scripts for a package
  set                 Set a configuration value
  get                 Get configuration value(s). Leave blank for all
                      values
  mypear              Set a configuration value
  help                Get help on a particular command, or all commands
  search              Search a registry of PEAR channels for packages
  make                Create or update a package.xml from a standard PEAR2
                      directory layout
  pickle              Create or update a package.xml and then package a
                      PECL extension release
  package             Create a release from an existing package.xml
  run-phpt            Run PHPT tests
  generate-pear2      Generate the source layout for a new
                      Pyrus-installable package
  generate-ext        Generate the source layout for a new PHP extension
                      that is PECL-ready
  scs-update          Simple channel server: Update all releases of a
                      within the get/ directory.
  scs-create          Simple channel server: Create a channel.xml, get/ and
                      rest/ directory for a channel
  scs-add-maintainer  Simple Channel Server: Add a new maintaing developer
                      to the channel
  scs-add-category    Simple Channel Server: Add a new category to the
                      channel
  scs-categorize      Simple Channel Server: Categorize a package
  scs-release         Simple Channel Server: Release a package

Unknown command: --help

Pyrus, the PHP manager

Usage:
  php pyrus.phar [/path/to/pyrus] [options]
  php pyrus.phar [/path/to/pyrus] [options]  [options] [args]

Options:
  -v, --verbose   increase verbosity
  -p, --paranoid  set or increase paranoia level
  -h, --help      show this help message and exit
  --version       show the program version and exit

Commands:
  install             Install a package.  Use install --plugin to install
                      plugins
  upgrade             Upgrade a package.  Use upgrade --plugin to upgrade
                      plugins
  uninstall           Uninstall a package.  Use uninstall --plugin to
                      uninstall plugins
  info                Display information about a package
  build               Build a PHP extension package from source and install
                      the compiled extension
  list-upgrades       List packages with upgrades available
  remote-list         List all remote packages in a channel, organized by
                      category
  download            Download a remote package to the current directory
  list-packages       List all installed packages in all channels
  list-channels       List all discovered channels
  channel-discover    Discover a new channel
  channel-del         Remove a channel from the registry
  upgrade-registry    Upgrade an old PEAR installation to the new registry
                      format
  run-scripts         Run all post-install scripts for a package
  set                 Set a configuration value
  get                 Get configuration value(s). Leave blank for all
                      values
  mypear              Set a configuration value
  help                Get help on a particular command, or all commands
  search              Search a registry of PEAR channels for packages
  make                Create or update a package.xml from a standard PEAR2
                      directory layout
  pickle              Create or update a package.xml and then package a
                      PECL extension release
  package             Create a release from an existing package.xml
  run-phpt            Run PHPT tests
  generate-pear2      Generate the source layout for a new
                      Pyrus-installable package
  generate-ext        Generate the source layout for a new PHP extension
                      that is PECL-ready
  scs-update          Simple channel server: Update all releases of a
                      within the get/ directory.
  scs-create          Simple channel server: Create a channel.xml, get/ and
                      rest/ directory for a channel
  scs-add-maintainer  Simple Channel Server: Add a new maintaing developer
                      to the channel
  scs-add-category    Simple Channel Server: Add a new category to the
                      channel
  scs-categorize      Simple Channel Server: Categorize a package
  scs-release         Simple Channel Server: Release a package
Posted in php.

日本語フォントのインストールがメイン

Phantomjsのインストール

cd /usr/local/share/
sudo wget http://phantomjs.googlecode.com/files/phantomjs-1.7.0-linux-x86_64.tar.bz2
sudo tar xjvf phantomjs-1.7.0-linux-x86_64.tar.bz2
sudo rm -rf phantomjs-1.7.0-linux-x86_64.tar.bz2
sudo mv phantomjs-1.7.0-linux-x86_64 phantomjs
sudo ln -s /usr/local/share/phantomjs/bin/phantomjs /usr/bin/phantomjs

日本語フォントのインストール

フォント設定ツールのインストール

sudo aptitude -y install fontconfig

フォント一覧を確認

fc-list
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSerif.ttf: DejaVu Serif:style=Book
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf: DejaVu Sans:style=Book
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono-Bold.ttf: DejaVu Sans Mono:style=Bold
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSerif-Bold.ttf: DejaVu Serif:style=Bold
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf: DejaVu Sans:style=Bold
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf: DejaVu Sans Mono:style=Book

好きなフォントをインストールします。

Takaoフォント

sudo aptitude -y install fonts-takao

IPAフォント

sudo aptitude -y install fonts-ipafont

IPAexフォント

sudo aptitude -y install otf-ipaexfont-gothic otf-ipaexfont-mincho

梅フォント

sudo aptitude -y install fonts-horai-umefont

UmePlusフォント

sudo aptitude -y install fonts-umeplus

さざなみフォント

sudo aptitude -y install ttf-sazanami-gothic ttf-sazanami-mincho

東風フォント

sudo aptitude -y install ttf-kochi-gothic ttf-kochi-mincho

VLフォント

sudo aptitude -y install ttf-vlgothic

IPAフォントを入れた場合

sudo aptitude -y install fonts-ipafont
fc-list
/usr/share/fonts/opentype/ipafont-mincho/ipam.ttf: IPAMincho,IPA明朝:style=Regular
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSerif.ttf: DejaVu Serif:style=Book
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans.ttf: DejaVu Sans:style=Book
/usr/share/fonts/opentype/ipafont-gothic/ipagp.ttf: IPAPGothic,IPA Pゴシック:style=Regular
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono-Bold.ttf: DejaVu Sans Mono:style=Bold
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSerif-Bold.ttf: DejaVu Serif:style=Bold
/usr/share/fonts/opentype/ipafont-mincho/ipamp.ttf: IPAPMincho,IPA P明朝:style=Regular
/usr/share/fonts/opentype/ipafont-gothic/ipag.ttf: IPAGothic,IPAゴシック:style=Regular
/usr/share/fonts/truetype/fonts-japanese-mincho.ttf: IPAMincho,IPA明朝:style=Regular
/usr/share/fonts/truetype/fonts-japanese-gothic.ttf: IPAGothic,IPAゴシック:style=Regular
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSans-Bold.ttf: DejaVu Sans:style=Bold
/usr/share/fonts/truetype/ttf-dejavu/DejaVuSansMono.ttf: DejaVu Sans Mono:style=Book

Yahoo!ジャパンのスクリーンショット

phantomjs /usr/share/phantomjs/examples/rasterize.js http://www.yahoo.co.jp/ yahoo.png

文字化けせずにスクリーンショットが撃てるはすです。

Hadoop streaming用のスクリプトファイルを配布する時など

lsyncd

lsyncdインストール

sudo aptitude -y install lsyncd

lsyncd設定ファイル

sudo vi /etc/lsyncd/lsyncd.conf.lua
----
-- Streaming configuration file for lsyncd.
--
settings = {
    statusFile = "/var/run/lsyncd.stat",
    statusInterval = 30,
}

sync { 
    default.rsync, 
    source="/home/mapred/",
    target="slaves000:/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10 
}
sync { 
    default.rsync, 
    source="/home/mapred/",
    target="slaves001:/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10 
}
sync { 
    default.rsync, 
    source="/home/mapred/",
    target="slaves002:/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10 
}

lsyncdデーモン起動

sudo /etc/init.d/lsyncd start

rsync

rsyncd設定ファイル

sudo vi /etc/rsyncd.conf
# GLOBAL OPTIONS

# pid file = /var/run/rsync.pid
# log file = /var/log/rsync.log

timeout = 600
hosts allow *.sheeps.me
read only = yes

max connections = 2
dont compress = *.gz *.tgz *.zip *.z *.rpm *.deb *.iso *.bz2 *.tbz

[MapReduce]
comment = PHP for Hadoop streaming
path = /home/mapred
uid = mapred
gid = mapred

rsyncdデフォルトファイル

sudo vi /etc/default/rsync
# start rsync in daemon mode from init.d script?
#  only allowed values are "true", "false", and "inetd"
#  Use "inetd" if you want to start the rsyncd from inetd,
#  all this does is prevent the init.d script from printing a message
#  about not starting rsyncd (you still need to modify inetd's config yourself).
RSYNC_ENABLE=true

rsyncデーモン起動

sudo /etc/init.d/rsync start

SSHを利用したlsyncd設定

sudo vi /etc/lsyncd/lsyncd.conf.lua
----
-- Streaming configuration file for lsyncd.
--
settings = {
    statusFile = "/var/run/lsyncd.stat",
    statusInterval = 30,
}

sync {
    default.rsyncssh,
    source="/home/mapred/",
    host="hdfs@slaves000",
    targetdir="/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10
}

sync {
    default.rsyncssh,
    source="/home/mapred/",
    host="hdfs@slaves001",
    targetdir="/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10
}

sync {
    default.default.rsyncssh,
    source="/home/mapred/",
    host="hdfs@slaves002",
    targetdir="/home/mapred/",
    rsyncOps={"-aruz", "--delete"}, 
    delay=10
}

rootのSSH設定

sudo vi /root/.ssh/config
Host slaves000
    HostName            slaves000.sheeps.me
    IdentityFile        /root/.ssh/id_rsa
    User                hdfs

Host slaves001
    HostName            slaves001.sheeps.me
    IdentityFile        /root/.ssh/id_rsa
    User                hdfs

Host slaves002
    HostName            slaves002.sheeps.me
    IdentityFile        /root/.ssh/id_rsa
    User                hdfs
sudo cp $HADOOP_HOME/.ssh/id_rsa /root/.ssh/id_rsa
sudo chmod 0600 /root/.ssh/id_rsa

Lsyncdからrsyncを実行するユーザーを変更する方法がわからないので
デフォルトのrootで実行させています。

接続先が/root/.ssh/known_hostsに登録されていないとエラーとなりました。
予め接続して登録しておく必要があるようです。

ポートの開閉

ufwの利用

ファイアーウォールを無効にする

sudo ufw disable

すべてのアクセスを拒否に設定

sudo ufw default deny

SSHを許可

sudo ufw allow ssh

HTTPを許可

sudo ufw allow http

HTTPSを許可

sudo ufw allow https

MySQLをローカルネットワークだけ許可

sudo ufw allow from 192.168.11.0/24 to any port mysql

ファイアーウォールを有効にする

sudo ufw enable

プロキシサーバを無効にする

sudo ufw deny 3128/tcp

ルールを削除する

sudo ufw delete DENY 3128/tcp

設定確認

sudo ufw status

Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere
80                         ALLOW       Anywhere
443                        ALLOW       Anywhere
3306                       ALLOW       192.168.11.0/24

ufwヘルプ

Commands:
 enable                          enables the firewall
 disable                         disables the firewall
 default ARG                     set default policy
 logging LEVEL                   set logging to LEVEL
 allow ARGS                      add allow rule
 deny ARGS                       add deny rule
 reject ARGS                     add reject rule
 limit ARGS                      add limit rule
 delete RULE|NUM                 delete RULE
 insert NUM RULE                 insert RULE at NUM
 reset                           reset firewall
 status                          show firewall status
 status numbered                 show firewall status as numbered list of RULES
 status verbose                  show verbose firewall status
 show ARG                        show firewall report
 version                         display version information

Application profile commands:
 app list                        list application profiles
 app info PROFILE                show information on PROFILE
 app update PROFILE              update PROFILE
 app default ARG                 set default application policy

確認

現在有効な接続状況確認

netstat -antu

Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:6379          0.0.0.0:*               LISTEN

MySQLとMongoDBの設定ファイル内でローカル接続のみを有効に設定しているため
127.0.0.1の制限がある。

番号付き一覧表示

iptables -L --line-numbers

TCPダンプで確認

sudo tcpdump -s 1600 -X -i eth0 src port 80
sudo tcpdump -s 1600 -X -i eth0 dst port 80

phpを利用したMapReduce実行

サンプルデータの配置

サンプルデータの作成

echo Hello World Bye World > file01
echo Hello Hadoop Goodbye Hadoop > file02

ls
file01  file02

HDFS上にinputディレクトリを作成

sudo -u hdfs hadoop fs -mkdir /user/hdfs/input

HDFS上にサンプルデータを配置

sudo -u hdfs hadoop fs -put file01 /user/hdfs/input/file01
sudo -u hdfs hadoop fs -put file02 /user/hdfs/input/file02

sudo -u hdfs hadoop fs -cat /user/hdfs/input/file01
Hello World Bye World

sudo -u hdfs hadoop fs -cat /user/hdfs/input/file02
echo Hello Hadoop Goodbye Hadoop

map処理の作成

vi map.php

[php]
<?php
while (($row = fgetcsv(STDIN, 1024, " ")) !== FALSE) {
foreach ($row as $word) {
if ($word !== ”) {
echo "${word}\t1\n";
}
}
}
?>
[/php]

map.phpローカルテスト

cat file01 file02 | php ./map.php

Hello   1
World   1
Bye     1
World   1
Hello   1
Hadoop  1
Goodbye 1
Hadoop  1

キーと値のペアが出力される。
値は文字の出現回数とし1を固定。

map処理と同じ状態で出力

cat file01 file02 | php ./map.php | sort

Bye     1
Goodbye 1
Hadoop  1
Hadoop  1
Hello   1
Hello   1
World   1
World   1

キーを元にソートされるためsortコマンドへ送る。

reduce処理の作成

vi reduce.php

[php]
<?php
$count = array();
while ((list($key, $value) = fgetcsv(STDIN, 1024, "\t")) !== FALSE) {
$count[$key] = empty($count[$key]) ? 1: $count[$key] + 1;
}

foreach ( $count as $key => $value ) {
echo "${key}\t${value}\n";
}
?>
[/php]

reduce.phpローカルテスト

cat file01 file02 | php ./map.php | sort | php ./reduce.php

Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2

キーと値のペアを配列にマップしてカウント

Hadoop Streamingの実行

ファイルの配信

scp -r /home/mapred hdfs@slaves000:/home/
scp -r /home/mapred hdfs@slaves001:/home/
scp -r /home/mapred hdfs@slaves002:/home/

ストリーミングモジュールを利用してmapreduceの実行

sudo su hdfs

/usr/lib/hadoop-0.20/bin/hadoop \
  jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u5.jar \
  -input /user/hdfs/input \
  -output /user/hdfs/output \
  -mapper '/usr/bin/php /home/mapred/map.php' \
  -reducer '/usr/bin/php /home/mapred/reduce.php'

/user/hdfs/outputが既に存在しているとエラーになります。

結果の確認

sudo -s hdfs hadoop fs -ls /user/hdfs/output
Found 3 items
-rw-r--r--   1 hdfs supergroup          0 2012-12-02 04:23 /user/hdfs/output/_SUCCESS
drwxr-xr-x   - hdfs supergroup          0 2012-12-02 04:24 /user/hdfs/output/_logs
-rw-r--r--   1 hdfs supergroup         41 2012-12-02 04:25 /user/hdfs/output/part-00000

sudo -u hdfs hadoop fs -cat /user/hdfs/output/part-00000

Bye     1
Goodbye 1
Hadoop  2
Hello   2
World   2

HDFS上の/user/hdfs/output/に結果が保存される。

HDFSマスター1つ HDFSスレーブ3つの構成

DataNodeとTaskTrackerのインストール

sudo aptitude -y install hadoop-0.20 hadoop-0.20-datanode hadoop-0.20-tasktracker

設定ファイルの同期

全スレーブにSSH公開鍵の登録

ssh root@slaves000 mkdir /usr/lib/hadoop-0.20/.ssh
scp /usr/lib/hadoop-0.20/.ssh/authorized_keys root@slaves000:/usr/lib/hadoop-0.20/.ssh/
ssh root@slaves000 chown -R hdfs:hdfs /usr/lib/hadoop-0.20/.ssh/
ssh root@slaves000 mod 0600 /usr/lib/hadoop-0.20/.ssh/authorized_keys 

設定ファイルの配信

rsync -av /etc/hadoop-0.20/conf hdfs@slaves000:/etc/hadoop-0.20/conf
rsync -av /etc/hadoop-0.20/conf hdfs@slaves001:/etc/hadoop-0.20/conf
rsync -av /etc/hadoop-0.20/conf hdfs@slaves002:/etc/hadoop-0.20/conf

予めhdfs権限で上書きできるように設定する必要があります。

/usr/lib/hadoop-0.20/conf/以下はマスターと同じ設定にします。

設定ファイルの編集

hostsの設定

sudo vi /etc/hosts
192.168.196.125   masters000.sheeps.me    masters000
192.168.196.126   slaves000.sheeps.me     slaves000
192.168.196.127   slaves001.sheeps.me     slaves001
192.168.196.128   slaves002.sheeps.me     slaves002

初期化

cacheディレクトリの設定

sudo mkdir -p /var/lib/hadoop-0.20/cache
sudo chown -R hdfs:hadoop /var/lib/hadoop-0.20

sudo chmod 0777 /var/lib/hadoop-0.20/cache

公開鍵の登録

sudo su hdfs
cd
mkdir ./.ssh
echo ssh-rsa ************** >> ./.ssh/authorized_keys
chmod 0600 ./.ssh/authorized_keys

サービスの起動

DataNodeとTaskTrackerの起動

sudo service hadoop-0.20-datanode start
sudo service hadoop-0.20-tasktracker start

HDFSマスターのインストールへ

HDFSマスター1つ HDFSスレーブ3つの構成

NamenodeとJobtrackerのインストール

sudo aptitude -y install hadoop-0.20 hadoop-0.20-namenode hadoop-0.20-jobtracker

設定ファイルの編集

core-site.xmlの設定

sudo vi /etc/hadoop-0.20/conf/core-site.xml

[sourcecode language=”plain”]
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!– Put site-specific property overrides in this file. –>

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://masters000:8020</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/var/lib/hadoop-0.20/cache/${user.name}</value>
</property>

<!– OOZIE proxy user setting –>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
</configuration>
[/sourcecode]

hdfs-site.xmlの設定

sudo vi /etc/hadoop-0.20/conf/hdfs-site.xml

[sourcecode language=”plain”]
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!– Put site-specific property overrides in this file. –>

<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!– Immediately exit safemode as soon as one DataNode checks in.
On a multi-node cluster, these configurations must be removed. –>
<property>
<name>dfs.safemode.extension</name>
<value>0</value>
</property>
<property>
<name>dfs.safemode.min.datanodes</name>
<value>1</value>
</property>
<property>
<!– specify this so that running ‘hadoop namenode -format’ formats the right dir –>
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value>
</property>

<!– Enable Hue Plugins –>
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
<description>Comma-separated list of namenode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
<description>Comma-separated list of datanode plug-ins to be activated.
</description>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:10090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
</configuration>
[/sourcecode]

WebHDFSを有効にappendをサポートするように追加しています。
fluentdなどでは必要になるプラグインになります。

mapred-site.xmlの設定

sudo vi /etc/hadoop-0.20/conf/mapred-site.xml

[sourcecode language=”plain”]
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!– Put site-specific property overrides in this file. –>

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>masters000:8021</value>
</property>

<!– Enable Hue plugins –>
<property>
<name>mapred.jobtracker.plugins</name>
<value>org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin</value>
<description>Comma-separated list of jobtracker plug-ins to be activated.
</description>
</property>
<property>
<name>jobtracker.thrift.address</name>
<value>0.0.0.0:9290</value>
</property>
</configuration>
[/sourcecode]

hadoop-env.shの設定

sudo vi /etc/hadoop-0.20/conf/hadoop-env.sh

[sourcecode language=”plain”]
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME. All others are
# optional. When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use. Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

# Extra Java CLASSPATH elements. Optional.
# export HADOOP_CLASSPATH="<extra_entries>:$HADOOP_CLASSPATH"

# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000

# Extra Java runtime options. Empty by default.
# if [ "$HADOOP_OPTS" == "" ]; then export HADOOP_OPTS=-server; else HADOOP_OPTS+=" -server"; fi

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options. Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored. $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync’d from. Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands. Unset by default. This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by
# the users that are going to run the hadoop daemons. Otherwise there is
# the potential for a symlink attack.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes. See ‘man nice’.
# export HADOOP_NICENESS=10
[/sourcecode]
JAVA_HOME=/usr/lib/jvm/java-6-sunだけ変更すれば基本動くようです。

mastersの設定

sudo vi /etc/hadoop-0.20/conf/masters
masters000

slavesの設定

sudo vi /etc/hadoop-0.20/conf/slaves
slaves000
slaves001
slaves002

初期化

cacheディレクトリの設定

sudo mkdir -p /var/lib/hadoop-0.20/cache
sudo chown -R hdfs:hadoop /var/lib/hadoop-0.20

sudo chmod 0777 /var/lib/hadoop-0.20/cache

SSH公開鍵の登録

sudo su hdfs
ssh-keygen -t rsa -P "" 
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys 

各サーバへパスフレーズなしでsshログイン出来るように設定します。

NameNodeのフォーマット

sudo -u hdfs hadoop namenode -format

設定ファイルの編集

hostsの設定

sudo vi /etc/hosts
127.0.0.1         localhost
127.0.0.1         masters000.sheeps.me    masters000
192.168.196.125   masters000.sheeps.me    masters000
192.168.196.126   slaves000.sheeps.me     slaves000
192.168.196.127   slaves001.sheeps.me     slaves001
192.168.196.128   slaves002.sheeps.me     slaves002

ドメイン設定をしっかりやらないと上手くいかない場合があるようです。

サービスの起動

NameNodeとJobTrackerの起動

sudo service hadoop-0.20-namenode start
sudo service hadoop-0.20-jobtracker start

起動時のclasspath

/usr/lib/hadoop-0.20/conf:/usr/lib/jvm/java-6-sun/lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u5.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/asm-3.2.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-io-2.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u5.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jersey-core-1.8.jar:/usr/lib/hadoop-0.20/lib/jersey-json-1.8.jar:/usr/lib/hadoop-0.20/lib/jersey-server-1.8.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/mysql-connector-java-5.1.22-bin.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar

HDFSスレーブのインストールへ