对于我这样的data science新手来说,kaggle是一个很好的练手平台,在kaggle的starter难度中,数字识别是一个比较有代表性的项目。 昨晚使用nolearn构建ANN神经网络,识别率可以达到到98%.

( 做到100%的那位同学,为何你如此神猛…)


以下是代码:

import csv
import numpy as np
import sys
import pandas
from nolearn.dbn import DBN

testRest = 3000

# 导入csv文件并转换为array.
print("loading txt...")
fil = file('train.csv')
fil.readline()

txt = np.loadtxt(fil, delimiter=',',dtype=np.float)

data = txt

digits_label = np.array([i[0] for i in data])
digits_image = np.array([i[1:]/255.0 for i in data])

#digits_image = map(binaryzation, digits_image)

print("file loading ok , prepare to train model")

# object of ANN model. 
# Input 784 , Hidden 1000,500 Output 10.

clf = DBN(
    [digits_image.shape[1], 1000,500, 10],
    learn_rates=0.3,
    learn_rate_decays=0.9,
    epochs=10,
    verbose=1,
    )

clf.fit(digits_image[:-testRest],digits_label[:-testRest])

result = clf.predict(digits_image[-testRest:])

print("the err rate : %.2f %%" % (100*((result != digits_label[-testRest:]).sum())/float(testRest)) )


print("the result num %.2f "%(result!= digits_label[-testRest:]).sum() )

testTxt = file('test.csv')
testTxt.readline()

testData = np.loadtxt(testTxt,delimiter=',',dtype=np.int)
result2 = clf.predict(testData)
out = open('outputann.csv','wb')
count = 1
out.write(u'"ImageId","Label"\n')
for i in result2:
    out.write(repr(count) + ",")
    count +=1
    out.write(r'"'+repr(i)+r'"')
    out.write('\n')
out.close()
exit()

XiaoGo

Hey, I'm Geek.