System Prompt Strategy Guide¶

System prompts are crucial for controlling the behavior, style, and quality of your generated datasets. This guide explains how to craft effective system prompts for different stages of the DeepFabric pipeline and provides strategies for different use cases.

Understanding the Pipeline Stages¶

DeepFabric has multiple stages where system prompts can be specified, each serving different purposes:

1. Dataset System Prompt¶

dataset_system_prompt: "Your main prompt here..."

This prompt serves as the system message in your final dataset (the first message in each conversation). It defines the AI assistant's role and behavior for end users. This is what will be seen when your dataset is used for training or inference.

2. Topic Generation System Prompt¶

topic_tree:
  topic_system_prompt: "Prompt for topic exploration..."
  # or for graphs:
topic_graph:
  topic_system_prompt: "Prompt for topic exploration..."

Controls how topics are generated and organized in your tree or graph structure.

3. Data Engine System Prompt¶

data_engine:
  generation_system_prompt: "Prompt for content generation..."

Controls how the actual training data (question-answer pairs) are generated.

System Prompt Strategy by Use Case¶

Educational Content Creation¶

Topic Generation Prompt:

topic_tree:
  topic_system_prompt: "You are a curriculum designer creating comprehensive learning paths. Focus on logical progression, prerequisite relationships, and ensuring complete coverage of essential concepts. Think broadly about what learners need to know and organize topics in a pedagogically sound sequence."

Data Generation Prompt:

data_engine:
  generation_system_prompt: "You are an experienced educator teaching [SUBJECT] to [LEVEL] students. Provide clear, step-by-step explanations with concrete examples. Avoid jargon unless you explain it. Include common misconceptions and how to avoid them. Always provide practical examples that students can relate to."

Technical Documentation¶

Topic Generation Prompt:

topic_tree:
  topic_system_prompt: "You are a technical documentation architect organizing complex software concepts. Focus on logical dependencies, real-world implementation patterns, and comprehensive coverage of both theoretical foundations and practical applications."

Data Generation Prompt:

data_engine:
  generation_system_prompt: "You are a senior software engineer writing comprehensive technical documentation. Provide detailed explanations with working code examples, discuss trade-offs between different approaches, include performance considerations, and demonstrate patterns with real-world scenarios. Always include complete, runnable code snippets that follow best practices."

Customer Service Training¶

Topic Generation Prompt:

topic_tree:
  topic_system_prompt: "You are a customer service training specialist identifying comprehensive scenarios across industries and interaction types. Focus on common customer issues, escalation patterns, cultural considerations, and industry-specific challenges."

Data Generation Prompt:

data_engine:
  generation_system_prompt: "You are an expert customer service trainer creating realistic interaction scenarios. Demonstrate empathetic communication, active listening, problem-solving techniques, and professional responses to difficult situations. Include diverse customer personalities and complex problem-solving scenarios while maintaining a helpful, patient tone."

Research and Academic Content¶

Topic Generation Prompt:

topic_graph:
  topic_system_prompt: "You are a research committee chair organizing academic knowledge domains. Focus on interdisciplinary connections, current research trends, methodological approaches, and theoretical frameworks. Ensure comprehensive coverage of both foundational concepts and cutting-edge developments."

Data Generation Prompt:

data_engine:
  generation_system_prompt: "You are a distinguished researcher and academic writing comprehensive research content. Provide rigorous analysis with proper citations, discuss methodological approaches, present balanced perspectives on controversial topics, and maintain academic writing standards. Include relevant examples from current literature and practical applications of research findings."

Advanced Prompt Engineering Techniques¶

1. Persona Definition¶

Be specific about the expertise level and background:

system_prompt: "You are a senior machine learning engineer with 10+ years in production ML systems at tech companies like Google and Netflix..."

2. Output Format Specification¶

Define the expected structure and style:

dataset_system_prompt: "...Your responses should follow this structure: 1) Brief concept explanation, 2) Practical code example, 3) Common pitfalls, 4) Best practices, 5) Real-world applications."

3. Audience Targeting¶

Specify the target audience clearly:

dataset_system_prompt: "...Tailor your explanations for software engineers with 2-5 years of experience who understand basic programming concepts but are new to machine learning."

4. Quality Standards¶

Set specific quality expectations:

dataset_system_prompt: "...All code examples must be runnable, follow PEP 8 standards, include proper error handling, and demonstrate production-ready patterns."

Common Patterns and Best Practices¶

When to Use the Same Prompt¶

Use identical prompts across stages when:

You want consistent voice and style throughout
The expertise level and approach should be uniform
You're creating content for a single, well-defined audience

Example:

dataset_system_prompt: "You are a Python instructor teaching intermediate developers."

topic_tree:
  topic_system_prompt: "You are a Python instructor teaching intermediate developers."

data_engine:
  generation_system_prompt: "You are a Python instructor teaching intermediate developers."

When to Use Different Prompts¶

Use specialized prompts when:

Different stages require different thinking modes (broad vs. deep)
You want different levels of detail (overview vs. implementation)
Different expertise is needed (architect vs. implementer)

Example:

dataset_system_prompt: "Python education pipeline"

topic_tree:
  topic_system_prompt: "You are a curriculum designer organizing Python concepts for logical learning progression."

data_engine:
  generation_system_prompt: "You are a hands-on Python instructor providing detailed coding examples and debugging help."

Prompt Testing and Iteration¶

Start Simple, Then Specialize¶

Begin with basic, clear prompts
Generate small samples to test behavior
Identify issues or desired improvements
Add specific instructions to address them
Test again and refine

A/B Testing Approach¶

Create multiple prompt variants and compare outputs:

# Version A: General
dataset_system_prompt: "You are a helpful programming assistant."

# Version B: Specific
dataset_system_prompt: "You are a senior Python developer specializing in web frameworks, with expertise in Django, Flask, and FastAPI. You provide production-ready code examples with security best practices."

Quality Metrics to Consider¶

Accuracy: Technical correctness of information
Consistency: Uniform style and approach across examples
Completeness: Coverage of essential details
Practicality: Real-world applicability of examples
Clarity: Accessibility to target audience

Common Pitfalls to Avoid¶

1. Overly Generic Prompts¶

❌ Avoid: "You are a helpful assistant." ✅ Better: "You are a machine learning engineer specializing in recommendation systems..."

2. Conflicting Instructions¶

❌ Avoid: Different stages with contradictory personas or styles ✅ Better: Consistent expertise levels and complementary approaches

3. Missing Context¶

❌ Avoid: Prompts without audience, format, or quality specifications ✅ Better: Clear definitions of who, what, how, and why

4. Too Many Instructions¶

❌ Avoid: Extremely long prompts with dozens of requirements ✅ Better: Focused prompts with 3-5 clear, specific instructions

Example Configurations¶

Complete Configuration Examples¶

See our example configurations for different domains:

example-specialized-prompts.yaml - Different prompts per stage
example-domain-specific-prompts.yaml - Medical domain specialization
example-mixed-prompts.yaml - Combining consistent and custom prompts

Quick Reference Templates¶

For Code Generation:

dataset_system_prompt: "You are a [LANGUAGE] expert with [X] years of experience in [DOMAIN]. Provide working code examples with proper error handling, follow [STANDARDS], and explain complex concepts clearly."

For Educational Content:

dataset_system_prompt: "You are an experienced [SUBJECT] instructor teaching [LEVEL] students. Use clear explanations, practical examples, and address common misconceptions."

For Business/Professional:

dataset_system_prompt: "You are a [ROLE] professional with expertise in [DOMAIN]. Provide practical, actionable guidance based on industry best practices and real-world experience."