git://www.github.com/eabdullin/Word2Vec.Net.git
git clone http://www.github.com/eabdullin/Word2Vec.Net
$ svn co --depth empty http://www.github.com/eabdullin/Word2Vec.Net
Checked out revision 1.
$ cd repo
$ svn up trunk
面向. NET 框架的Word2Vec ( https://code.google.com/p/word2vec/ )
#Getting 已经启动
##Using
varbuilder = Word2VecBuilder.Create();
if ((i = ArgPos("-train", args))> -1)
builder.WithTrainFile(args[i + 1]);
if ((i = ArgPos("-output", args))> -1)
builder.WithOutputFile(args[i + 1]);
//to all other parameters will be set default valuesvarword2Vec = builder.Build();
word2Vec.TrainModel();
vardistance = newDistance(args[i + 1]);
BestWord[] bestwords = distance.Search("some_word");
或者
//more explicit optionstringtrainfile="C:/data.txt";
stringoutputFileName = "C:/output.bin";
varword2Vec = Word2VecBuilder.Create()
. WithTrainFile(trainfile)//Use text data to train the model;. WithOutputFile(outputFileName)//Use to save the resulting word vectors/word clusters. WithSize(200)//Set size of word vectors; default is 100. WithSaveVocubFile()//The vocabulary will be saved to <file>. WithDebug(2)//Set the debug mode (default = 2 = more info during training). WithBinary(1)//Save the resulting vectors in binary moded; default is 0 (off). WithCBow(1)//Use the continuous bag of words model; default is 1 (use 0 for skip-gram model). WithAlpha(0.05)//Set the starting learning rate; default is 0.025 for skip-gram and 0.05 for CBOW. WithWindow(7)//Set max skip length between words; default is 5. WithSample((float) 1e-3)//Set threshold for occurrence of words. Those that appear with higher frequency in the training data twill be randomly down-sampled; default is 1e-3, useful range is (0, 1e-5). WithHs(0)//Use Hierarchical Softmax; default is 0 (not used). WithNegative(5)//Number of negative examples; default is 5, common values are 3 - 10 (0 = not used). WithThreads(5)//Use <int> threads (default 12). WithIter(5)//Run more training iterations (default 5). WithMinCount(5)//This will discard words that appear less than <int> times; default is 5. WithClasses(0)//Output word classes rather than word vectors; default number of classes is 0 (vectors are written). Build();
word2Vec.TrainModel();
vardistance = newDistance(outputFile);
BestWord[] bestwords = distance.Search("some_word");
Google word2vec的##Information: ###Tools 用于计算单词的分布式 representtion
我们提供了连续 Bag-of-Words ( CBOW ) 和跳过gram模型( SG )的实现,以及几个演示脚本。
给定一个文本语料库,word2vec工具使用连续Bag-of-Words或者skip神经网络结构来学习词汇中每个单词的一个向量。 用户应指定以下内容:
通常,其他超参数如学习率不需要针对不同的训练集进行调整。
脚本 demo-word.sh 从网络下载一个小的( 100mb ) 文本语料库,并训练一个小单词向量模型。 训练完成后,用户可以交互式地探索单词的相似性。
有关脚本的更多信息在 https://code.google.com/p/word2vec/ 提供。