Kyrgyz NLP & Open Source

Pioneering language technology for low-resource languages

Introduction

At Artisan Labs, NLP is not just a field of research—it’s a movement. Our mission is to bridge the digital language divide by developing cutting-edge, open-source tools tailored for the Kyrgyz language and beyond. Our projects include language models, tokenizers, and various NLP utilities that empower researchers and developers to advance language technology in low-resource settings.

Our Models

We’re proud to share our innovative models that address unique linguistic challenges. Explore our key projects:

Custom Tokenizers

Tokenization is the first step toward effective NLP. Our open-source tokenizers are specifically designed for the Kyrgyz language:

How to Get Started

All our models and tokenizers are available on Hugging Face. For example, you can load KyrgyzBert with the following code:

from transformers import BertTokenizerFast, BertForMaskedLM
model_name = "metinovadilet/KyrgyzBert"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = BertForMaskedLM.from_pretrained(model_name)

text = "Бул жерден [MASK] нерселерди таба аласыз."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs).logits

# Get predictions for the [MASK] token
masked_index = (inputs.input_ids == tokenizer.mask_token_id).nonzero(as_tuple=True)[1].item()
print("Predictions:", tokenizer.decode(outputs[0, masked_index].argmax().item()))
      

Similarly, explore our tokenizers on Hugging Face to jumpstart your Kyrgyz NLP projects.

Join the Community

We’re passionate about collaboration. Whether you’re a researcher, developer, or language enthusiast, we invite you to contribute, share feedback, or even build upon our open-source projects. Visit our Hugging Face profile at metinovadilet for the latest updates and releases.