Agentyk
← Back to agentyk

Model & Data Notices

Last updated: 28 June 2026

Agentyk is built on open-weight AI and on open, openly-licensed data. This page explains, at a high level, the foundation models our hosted Service draws on, the open knowledge-graph data and ontologies that Agentyk Knowledge builds on, and the open-source licences and attributions that apply. It forms part of, and is incorporated by reference into, our Terms of Service.

1. We build on open-weight models

The language, speech, embedding, reranking, and document-OCR / vision models behind AgentykCloud chat and our APIs (including Agentyk Scribe and Agentyk Knowledge) are built on, and fine-tuned and optimised from, open-weight foundation models— models whose trained parameters are published under an open or source-available licence. We run them on our own European infrastructure. We do not build the Service on closed, proprietary models accessed over someone else's API, and we do not use your prompts, inputs, or outputs to train models.

2. Which models we use

We continuously evaluate, combine, fine-tune, and rotate the models we run as the open-weight ecosystem advances. Depending on the product, the tier, the region, and the time, our deployments may draw on models from families such as:

  • Google Gemma
  • Meta Llama
  • Alibaba Qwen
  • Baidu (PaddleOCR, ERNIE)
  • Mistral and Mixtral
  • Microsoft Phi
  • DeepSeek
  • IBM Granite
  • TII Falcon
  • OpenAI open-weight (gpt-oss) models
  • Whisper-family and other open speech-to-text models
  • open document-OCR and vision-language models (for reading scanned and image-only documents), including PP-OCR / PaddleOCR and vision-language releases from the families above
  • open embedding and reranking models (for retrieval and knowledge-graph grounding), including releases from the families above
  • and other open-weight models, including our own fine-tunes of the above

This list is illustrative and non-exhaustive, and the specific models in use change over time. For commercial, security, and operational reasons, we do not disclose which specific model, version, or configuration powers any particular product tier or codename, and a given codename may map to different underlying models over time.

3. How we build and serve our models

We do more than run these models unchanged. To deliver the Service we build derivative and optimised model systems by applying a range of engineering and machine-learning techniques. Depending on the product, the tier, and the time, these may include, among others:

  • Fine-tuning and adaptation— supervised fine-tuning, instruction tuning, parameter-efficient methods (such as LoRA and adapters), continued or domain pre-training, and preference or reinforcement-learning alignment (such as RLHF, RLAIF, or DPO).
  • Distillation— training smaller or faster models to reproduce the behaviour of larger ones.
  • Quantization and pruning— reducing numerical precision or removing redundant parameters to serve models more efficiently.
  • Model merging, ensembling, and mixture-of-experts — merging or combining multiple models, and routing or multi-model setups that run several models and return a selected, consensus, or best answer.
  • Prompt and system-prompt engineering— instructions, templates, and few-shot examples that shape behaviour.
  • Retrieval-augmented generation, knowledge graphs, and search grounding — supplementing a model with retrieved documents, structured knowledge, embeddings, or web and search results.
  • Tool use and agentic harnesses— orchestrating multi-step reasoning, tool or function calls, planning, and verification or self-consistency loops.
  • Content moderation and safety— classifiers, guardrails, and filtering applied before, during, or after generation.
  • Inference and serving optimisation— techniques such as speculative decoding, batching, caching, KV-cache and decoding optimisations, reranking, and constrained or structured decoding.
  • Other related techniques— we continuously adopt new methods as the field advances.

The result is a derivative system that may behave differently from, and perform better than, any single underlying model. This description is general, illustrative, and non-binding: the specific techniques and their combination change over time, we apply them differently across products and tiers, and nothing here commits us to using, or continuing to use, any particular technique.

4. Knowledge-graph data & ontologies

Agentyk Knowledge structures documents into a verifiable knowledge graph using open, openly-licensed ontologies and knowledge bases. Each is used under its own licence:

  • YAGO(Creative Commons Attribution-ShareAlike) — the general base ontology, taxonomy, and SHACL constraints, adopted version-pinned and unmodified.
  • schema.org(Creative Commons Attribution-ShareAlike 3.0) — the general-purpose vocabulary that YAGO builds upon.
  • Wikidata(Creative Commons CC0 / public domain) — stable identifiers used to link and disambiguate entities.
  • W3C PROV-O and Dublin Core Terms — provenance and document-metadata vocabularies.

We use these resources under their respective licences, and where a licence requires attribution, this page serves as that attribution. Any terms a customer defines to extend the vocabulary for their own knowledge base are stored in the customer's own database and are the customer's own work and data — not part of, and not distributed by, Agentyk.

5. Licences and attribution

Each foundation model we use is used under its own open-source or source-available licence (for example the Gemma Terms of Use, the Llama Community Licence, and the Apache 2.0 or MIT licences under which several of the families above are released), and the open data and ontologies in section 4 are used under their respective licences (such as Creative Commons Attribution-ShareAlike and CC0). We comply with those licences, including any attribution requirements and use restrictions they impose, and we pass through use restrictions to you via our Acceptable Use Policy. Where a licence requires it, this page and our notices serve as the required attribution (for example, products in this Service that are built with Llama are “Built with Llama”). The respective model names, dataset names, and trademarks belong to their owners; their inclusion here is for attribution and does not imply that those owners endorse, sponsor, or are affiliated with Agentyk or Sylvanity B.V.

6. No warranty from upstream

Open-weight models and open data are provided by their authors without warranty. Output generated through the Service is subject to the disclaimers in our Terms of Service — it may be inaccurate and is not professional advice. You remain responsible for reviewing and verifying output before relying on it.

7. Changes

Because we rotate and upgrade models and update the data and ontologies we build on, we revise this page from time to time; the “Last updated” date above reflects the latest revision. Questions about model or data licensing or attribution: info@sylvanity.eu.