Norwegian Legal Q&A Dataset

7,603 expert-curated question-answer pairs covering Norwegian legislation, case law, and regulatory compliance. Used to fine-tune dobetter-norge-v2.

Type Dataset

Format JSONL

Size 23 MB

Version v2

7,603 Q&A Pairs 2,847 Source Docs 12 Legal Domains JSONL Format

Norwegian Legal Q&A Training Data NLP

Dataset Overview

This dataset contains 7,603 question-answer pairs specifically designed for training language models on Norwegian legal reasoning. Each pair was derived from authoritative legal sources and validated against established legal interpretations.

Dataset Statistics

Metric	Value
Total Q&A Pairs	7,603
Source Documents	2,847
Training Examples (augmented)	31,842
Average Question Length	47 tokens
Average Answer Length	186 tokens
Legal Domains Covered	12

Domain Coverage

Contract Law (Avtaleloven)
Employment Law (Arbeidsmiljoloven)
Company Law (Aksjeloven)
Tax Law (Skatteloven)
Data Protection (Personopplysningsloven / GDPR)
Consumer Protection (Forbrukerkjopsloven)
Criminal Law (Straffeloven)
Administrative Law (Forvaltningsloven)
Environmental Law (Forurensningsloven)
Planning and Building (Plan- og bygningsloven)
Immigration Law (Utlendingsloven)
Intellectual Property (Åndsverkloven)

Data Format

JSON Lines format with the following structure:

{
  "id": "qa-0001",
  "question": "Hva er vilkårene for...",
  "answer": "I henhold til § 36...",
  "source_doc": "LOV-2023-06-16-40",
  "domain": "contract_law",
  "difficulty": "intermediate"
}

BNL Perspective

Building this dataset was the hardest part of the entire dobetter-norge project. Scraping legal text is straightforward; turning it into high-quality Q&A pairs that actually teach a model to reason about law — that took months of iteration. We ended up with a semi-automated pipeline: extract key provisions, generate candidate questions, then have domain experts validate and refine. The 7,603 pairs represent roughly 2,400 hours of combined expert review time.

Access Resource ← Back to Tech Resources

DATASET

Norwegian Legislative Corpus

Source document collection of 2,847 Norwegian laws, regulations, and court decisions. The foundation dataset used to generate training data for dobetter-norge-v2.

Access →

Norwegian Legal Q&A Dataset

Dataset Overview

Dataset Statistics

Domain Coverage

Data Format

Related Resources

Norwegian Legislative Corpus

Curious enough to click?
Enter CaveauAI.

Norwegian Legal Q&A Dataset

Dataset Overview

Dataset Statistics

Domain Coverage

Data Format

Related Resources

Norwegian Legislative Corpus

Curious enough to click?Enter CaveauAI.

Curious enough to click?
Enter CaveauAI.