lda_utils.tm_gensim

Module Contents

Functions

get_model_perplexity(model,eval_corpus)
compute_models_parallel(data,varying_parameters=None,constant_parameters=None,n_max_processes=None) Compute several Topic Models in parallel using the “gensim” package. Use a single or multiple document term matrices
evaluate_topic_models(data,varying_parameters,constant_parameters=None,n_max_processes=None,return_models=False,metric=None,**metric_kwargs) Compute several Topic Models in parallel using the “gensim” package. Calculate the models using a list of varying
get_model_perplexity(model, eval_corpus)
class MultiprocModelsWorkerGensim
fit_model(data, params, return_data=False)
class MultiprocEvaluationWorkerGensim
fit_model(data, params, return_data=False)
compute_models_parallel(data, varying_parameters=None, constant_parameters=None, n_max_processes=None)

Compute several Topic Models in parallel using the “gensim” package. Use a single or multiple document term matrices data and optionally a list of varying parameters varying_parameters. Pass parameters in constant_parameters dict to each model calculation. Use at maximum n_max_processes processors or use all available processors if None is passed. data can be either a Document-Term-Matrix (NumPy array/matrix, SciPy sparse matrix) or a dict with document ID -> Document-Term-Matrix mapping when calculating models for multiple corpora (named multiple documents).

If data is a dict of named documents, this function will return a dict with document ID -> result list. Otherwise it will only return a result list. A result list always is a list containing tuples (parameter_set, model) where parameter_set is a dict of the used parameters.

evaluate_topic_models(data, varying_parameters, constant_parameters=None, n_max_processes=None, return_models=False, metric=None, **metric_kwargs)

Compute several Topic Models in parallel using the “gensim” package. Calculate the models using a list of varying parameters varying_parameters on a single Document-Term-Matrix data. Pass parameters in constant_parameters dict to each model calculation. Use at maximum n_max_processes processors or use all available processors if None is passed. data must be a Document-Term-Matrix (NumPy array/matrix, SciPy sparse matrix). Will return a list of size len(varying_parameters) containing tuples (parameter_set, eval_results) where parameter_set is a dict of the used parameters and eval_results is a dict of metric names -> metric results.