Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Schemas Reference

Built-in schemas for validating database yield values.

Overview

Matchy includes built-in schemas that define the structure of yield values for common database types. When you specify a known schema type during matchy build, yield values are validated against the schema, catching errors early.

Available Schemas

NameMetadata TypeDescription
threatdbThreatDB-v1Threat intelligence with MISP/STIX-compatible fields

Using Schemas

CLI

Enable schema validation with --database-type:

# Use the short name - enables ThreatDB schema validation
matchy build --database-type threatdb threats.csv -o threats.mxy

# Custom names skip validation
matchy build --database-type "MyCompany-Intel" data.csv -o custom.mxy

When you use a known schema name like threatdb:

  1. Yield values are validated against the schema during build
  2. The canonical database_type (ThreatDB-v1) is set in metadata
  3. Validation errors stop the build with helpful messages

Rust API

Use DatabaseBuilderExt::with_schema() for automatic validation during database building:

#![allow(unused)]
fn main() {
use matchy::{DatabaseBuilder, DatabaseBuilderExt, MatchMode, DataValue};
use std::collections::HashMap;

// Create builder with schema validation
let mut builder = DatabaseBuilder::new(MatchMode::CaseInsensitive)
    .with_schema("threatdb")?;

// Entries are validated automatically
let mut data = HashMap::new();
data.insert("threat_level".to_string(), DataValue::String("high".to_string()));
data.insert("category".to_string(), DataValue::String("malware".to_string()));
data.insert("source".to_string(), DataValue::String("abuse.ch".to_string()));

builder.add_entry("1.2.3.4", data)?;  // Validated!

// Invalid data fails immediately
let mut bad_data = HashMap::new();
bad_data.insert("threat_level".to_string(), DataValue::String("extreme".to_string()));
builder.add_entry("2.3.4.5", bad_data)?;  
// Error: Validation error: Entry '2.3.4.5': "extreme" is not one of [...]
}

You can also query schema information directly:

#![allow(unused)]
fn main() {
use matchy::schemas::{get_schema_info, is_known_database_type};

// Check if a type has built-in validation
if is_known_database_type("threatdb") {
    let info = get_schema_info("threatdb").unwrap();
    println!("Canonical type: {}", info.database_type); // "ThreatDB-v1"
}
}

ThreatDB Schema

The ThreatDB schema (threatdb) is designed for threat intelligence databases, with fields compatible with MISP and STIX 2.1 concepts.

Required Fields

FieldTypeDescription
threat_levelstringSeverity: critical, high, medium, low, unknown
categorystringThreat type (lowercase): malware, c2, phishing, etc.
sourcestringOrigin feed or organization

Optional Fields

FieldTypeDescription
confidenceintegerScore 0-100 (STIX 2.1 compatible)
first_seenstringISO 8601 datetime
last_seenstringISO 8601 datetime
descriptionstringHuman-readable notes
tagsarrayList of strings for classification
referencestringURL to external documentation
tlpstringTraffic Light Protocol: CLEAR, GREEN, AMBER, AMBER+STRICT, RED
indicator_typestringWhat the key represents: ip-src, domain, url, sha256, etc.

Threat Levels

ValueMISP EquivalentUse Case
critical-Active campaigns, zero-days
high1Known active threats
medium2Suspicious activity
low3Low confidence or historical
unknown4Insufficient data

Common Categories

malware      c2           phishing     botnet       ransomware
spam         scanner      proxy        cryptomining dropper
apt          tor-exit     vpn          bruteforce   exploit
rat          stealer      ddos

TLP (Traffic Light Protocol)

ValueSharing
CLEARUnrestricted (formerly WHITE)
GREENCommunity-wide
AMBERLimited distribution
AMBER+STRICTOrganization only
REDNamed recipients only

Example: CSV Input

key,threat_level,category,source,confidence,tags
192.0.2.1,high,c2,abuse.ch,95,"emotet,banking"
*.evil.com,medium,phishing,internal,75,
10.0.0.0/8,low,scanner,honeypot,50,

Example: JSON Input

{
  "192.0.2.1": {
    "threat_level": "high",
    "category": "c2",
    "source": "abuse.ch",
    "confidence": 95,
    "first_seen": "2024-01-15T10:30:00Z",
    "tags": ["emotet", "banking-trojan"],
    "tlp": "AMBER"
  },
  "*.evil.com": {
    "threat_level": "medium",
    "category": "phishing",
    "source": "internal",
    "description": "Phishing campaign targeting employees"
  }
}

Example: Build with Validation

$ matchy build --database-type threatdb -f json threats.json -o threats.mxy
Schema validation: enabled (ThreatDB-v1)
Building database from threats.json
  Added 2 entries
Successfully wrote threats.mxy

Validation Errors

Invalid data produces clear error messages:

$ cat bad.csv
key,threat_level,category,source
192.0.2.1,critical,malware,abuse.ch
10.0.0.1,extreme,badcat,

$ matchy build --database-type threatdb bad.csv -o out.mxy
Schema validation failed for entry "10.0.0.1"

Validation errors:
  - /threat_level: "extreme" is not one of ["critical","high","medium","low","unknown"]
  - /source: string length 0 is less than minLength 1

Use a custom --database-type name if you don't want schema validation.

Validating Existing Databases

The matchy validate command checks schema compliance for databases with known database_type:

# Validates structure AND schema if database_type is "ThreatDB-v1"
matchy validate threats.mxy

Validation detects the schema from the database_type metadata field.

Custom Schemas (Future)

Currently, only built-in schemas are supported. Custom schema support via --schema <file> may be added in future versions.

For now, use a custom --database-type name to skip schema validation:

# No validation - your own structure
matchy build --database-type "MyCompany-ThreatFeed-v2" data.json -o custom.mxy

Schema API Reference

DatabaseBuilderExt Trait

The DatabaseBuilderExt trait adds schema support to DatabaseBuilder:

#![allow(unused)]
fn main() {
use matchy::{DatabaseBuilder, DatabaseBuilderExt, MatchMode};

let builder = DatabaseBuilder::new(MatchMode::CaseInsensitive)
    .with_schema("threatdb")?;
}

with_schema(schema_name: &str) -> Result<Self, SchemaError>

Configures the builder with automatic schema validation.

  • All entries are validated before insertion
  • Sets database_type metadata automatically
  • Returns error if schema name is unknown
#![allow(unused)]
fn main() {
// Valid schema name
let builder = DatabaseBuilder::new(MatchMode::CaseInsensitive)
    .with_schema("threatdb")?;

// Unknown schema - returns SchemaError
let result = DatabaseBuilder::new(MatchMode::CaseInsensitive)
    .with_schema("unknown");
assert!(result.is_err());
}

Schema Lookup Functions

#![allow(unused)]
fn main() {
use matchy::schemas::{
    get_schema_info,
    schema_database_type,
    detect_schema_from_database_type,
    available_schemas,
    is_known_database_type,
};
}

get_schema_info(name: &str) -> Option<&'static SchemaInfo>

Returns full schema metadata.

#![allow(unused)]
fn main() {
let info = get_schema_info("threatdb").unwrap();
println!("{}: {}", info.name, info.description);
// threatdb: Threat intelligence database with MISP/STIX-compatible fields
}

schema_database_type(name: &str) -> Option<&'static str>

Maps short name to canonical database_type.

#![allow(unused)]
fn main() {
assert_eq!(schema_database_type("threatdb"), Some("ThreatDB-v1"));
}

detect_schema_from_database_type(db_type: &str) -> Option<&'static str>

Maps database_type back to schema name.

#![allow(unused)]
fn main() {
assert_eq!(detect_schema_from_database_type("ThreatDB-v1"), Some("threatdb"));
}

available_schemas() -> impl Iterator<Item = &'static str>

Lists all available schema names.

#![allow(unused)]
fn main() {
for name in available_schemas() {
    println!("  - {}", name);
}
}

is_known_database_type(name: &str) -> bool

Checks if a name is a known schema (short name or database_type).

#![allow(unused)]
fn main() {
assert!(is_known_database_type("threatdb"));
assert!(is_known_database_type("ThreatDB-v1"));
assert!(!is_known_database_type("Custom-Type"));
}

SchemaInfo Struct

#![allow(unused)]
fn main() {
pub struct SchemaInfo {
    /// Short name used in CLI (e.g., "threatdb")
    pub name: &'static str,
    /// Database type string set in metadata (e.g., "ThreatDB-v1")
    pub database_type: &'static str,
    /// Human-readable description
    pub description: &'static str,
}
}

See Also