Welcome to fastent’s documentation!¶
Contents:¶
fastent¶
Fastent is a tool designed for creating end to end Custom Named Entity Recognition models. Entities ARE NOT limited to the usual predefiend classes of Person(PER), Location(LOC), Companies/agencies/institutions(ORG) etc etc. Any custom entity that can be described using a list of words can be created.
The package is comprised of several modules that can be used both sperately for their designated tasks (i.e Anotation, contextualization, etc etc.) or in a combined workflow. Most of the modules offer multilingual support, meaning the datasets and text don’t necessarily require English language.
Table of contents¶
Installation¶
This section show the process for installing the package with different methods
From source¶
- lets start by cloning the package
git clone https://github.com/fastent/fastent.git
- Installing all the relevant packages
pip install -r requirements.txt
- Install couchDB
Update the current packages
sudo apt-get update
Adding PPA Repository
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:couchdb/stable
sudo apt-get update
Installing CouchDB
sudo apt-get install couchdb
Ownership changes (recommended to fix the permission)
sudo chown -R couchdb:couchdb /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
Once this is completed we need to fix the permissions
sudo chmod -R 0770 /usr/bin/couchdb /etc/couchdb /usr/share/couchdb
Restarting CouchDB
sudo systemctl restart couchdb
couchDB can now be accessed from http://127.0.0.1:5984/_utils/
- Now you need to install NLTK dependencies.
>>> import nltk
>>> nltk.download()
The minimum installation requires to download the stopwords corpora. (Feel free to add more if you feel so)
From pip¶
Coming Soon
Usage¶
Dataset generation¶
The module includes a possibility to generate a dataset for raw entity words. Example command looks as this if using source
python dataset_pseudo_generator.py -m en_core_web_lg -s cocaine,heroin
If using the package is installed
from fastent import dataset_pseudo_generator
model = dataset_pseudo_generator.spacy_initialize(model_name)
dataset_pseudo_generator.dataset_generate(model,['cocaine', 'heroin'], 100)