From zero to hero
Texthero is a python toolkit to work with text-based dataset quickly and effortlessly. Texthero is very simple to learn and designed to be used on top of Pandas. Texthero has the same expressiveness and power of Pandas and is extensively documented. Texthero is modern and conceived for programmers of the 2020 decade with little knowledge if any in linguistic.
You can think of Texthero as a tool to help you understand and work with text-based dataset. Given a tabular dataset, it's easy to grasp the main concept. Instead, given a text dataset, it's harder to have quick insights into the underline data. With Texthero, preprocessing text data, map it into vectors and visualize the obtained vector space takes just a couple of lines.
Texthero include tools for:
Preprocess text data: it offers both out-of-the-box solutions but it's also flexible for custom-solutions.
Natural Language Processing: keyphrases and keywords extraction, named entity recognition and much more.
Text representation: TF-IDF, term frequency, pre-trained and custom word-embeddings.
Vector space analysis: clustering (K-means, Meanshift, DBSAN and Hierarchical), topic modelling (LDA and LSI) and interpretation.
Text visualization: keywords visualization, vector space visualization, place localization on maps.
Texthero is free, open source and well documented (and that's what we love most by the way!).
We hope you will find pleasure working with Texthero as we had during his development.
github地址:https://github.com/jbesomi/texthero?u=1402400261&m=4511736348778241&cu=1968044071