SVM-light-TK を使ってみる

SVM-light-TK は Alessandro Moschitti が開発した Tree Kernel が使える SVM ライブラリ。h現在のバージョンは1.2.1。

http://disi.unitn.it/moschitti/TK1.2-software/download.html から軽いアンケートみたいなものをするとダウンロードできる。今回はサンプルデータが公式サイトにあるのでそれを使ってみる。

% cd svm-light-TK-1.2.1
% ./svm_learn -t 5 arg0.train trained-arg0
Scanning examples...done
Reading examples into memory...100..OK. (112 examples read)

Number of examples: 112, linear space size: 21478

estimating ...
Setting default regularization parameter C=1.0000
Optimizing........................................done. (41 iterations)
Optimization finished (3 misclassified, maxdiff=0.00100).
Runtime in cpu-seconds: 0.01
Number of SV: 91 (including 26 at upper bound)
L1 loss: loss=11.55698
Norm of weight vector: |w|=6.88679
Norm of longest example vector: |x|=1.00000
Estimated VCdim of classifier: VCdim<=48.42783
Computing XiAlpha-estimates...done
Runtime for XiAlpha-estimates in cpu-seconds: 0.00
XiAlpha-estimate of the error: error<=23.21% (rho=1.00,depth=0)
XiAlpha-estimate of the recall: recall=>75.00% (rho=1.00,depth=0)
XiAlpha-estimate of the precision: precision=>77.78% (rho=1.00,depth=0)
Number of kernel evaluations: 9711
Writing model file...done

% ./svm_classify tk1.2-arg/arg0.test model
Reading model...OK. (92 support vectors read)
Classifying test examples..100..done
Runtime (without IO) in cpu-seconds: 0.01
Accuracy on test set: 83.04% (93 correct, 19 incorrect, 112 total)
Precision/recall on test set: 84.91%/80.36%

svm_learn 時の -t でカーネルを指定している。5は部分木の組み合わせ(?) 他のオプションで細かく指定しているようだ。

5: combination of forest and vector sets according to W, V, S, C options

linear kernel とも比較してみると

% ./svm_learn -t 1 tk1.2-arg/arg0.test trained-arg0
...
% ./svm_classify tk1.2-arg/arg0.train trained-arg0
Reading model...OK. (104 support vectors read)
Classifying test examples..100..done
Runtime (without IO) in cpu-seconds: 0.00
Accuracy on test set: 76.99% (87 correct, 26 incorrect, 113 total)
Precision/recall on test set: 70.83%/91.07%

15%も精度に違いができた。