Skip to content

Training Data Module

The training_data module is responsible for generating synthetic and structured datasets used to fine-tune language models for the purpose of understanding and generating trading strategies. It produces prompt-response pairs useful for supervised fine-tuning.


Module Structure


prompt_data/ Submodule

  • condition_dicts.py
    Provides mappings of condition types and corresponding indicators.

  • string_options.py
    Contains sets of predefined string values from which to randomly select to generate prompts.

  • sp500.csv)
    A list of S&P 500 tickers used by the ticker_generator.


How It Works

The module is designed to be highly modular. You can: - Use each generator individually - Combine them via training_data.py to create full training examples - Set random seeds for reproducibility


What does it generate?

Vie the training_data.py module, you can generate a dataset of prompt-response pairs. Every time you call the generate_trading_data() function, it will create a new dataset with the specified number of examples. It always creates prompt that is a text in natural language and response to the prompt extracting from the prompt:

  • Whole Strategy object
  • Ticker
  • Position Type
  • Conditions
  • Stop Loss
  • Take Profit
  • Start Date
  • End Date
  • Interval
  • Period
  • Initial Capital
  • Order Size
  • Trade Commissions

All of those datasets are saved as .jsonl files and used for fine-tuning the models.