Amazon SageMaker BlazingText

I am working on a word embedding project. I am using Amazon SageMaker for this purpose. The BlazingText algorithm in the Amazon SageMaker produced fast result than the other options. But I don't see any facility to get the prediction model or the weights. The output consists only the vectors file from which I cannot generate the model.
Is there any way by which I can get the model with the vector file? I need this to predict new words. Thanks in advance.

What do you mean by “predict new words”? Word embedding is creating embedding to words in your “vocabulary“ and not new words. You can use stemming or hashing to handle out-of-vocabulary words, but not the embedding model.
– Guy
May 10 at 16:17

I am building a skip-gram based embedding model. usually skip-gram model outputs the context for a given word. I am doing this for a research and for the evaluations I want to get the context output. I am trying to solve polysemy words. So only the word embedding results are not enough. The requirement I have is the basic skip-gram model where I can get the context as output for a word.
– J.Jeyanthasingam
May 11 at 18:28

1 Answer
1

You can reproduce similar results like, most_similar by uploading the vector.txt/bin file using KeyedVectors api.

Here is an example:

from gensim.models import KeyedVectors word_vectors = KeyedVectors.load_word2vec_format('vectors.txt', binary=False) word_vectors = KeyedVectors.load_word2vec_format('vectors.bin', binary=True)

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Search This Blog

Mgiyuk