Training Data Module
The training_data
module is responsible for generating synthetic and structured datasets used to fine-tune language models for the purpose of understanding and generating trading strategies.
It produces prompt-response pairs useful for supervised fine-tuning.
Module Structure
-
training_data.py
Entry point for generating complete training datasets, combining all individual components. -
prompt_builder.py
Constructs natural language prompts from structured strategy definitions. -
condition_generator.py
Randomly generates valid strategy entry/exit conditions. -
interval_generator.py
Supplies typical time intervals defined here. -
stop_loss_take_profit_generator.py
Generates stop-loss and take-profit configurations. -
capital_size_commission_generator.py
Produces initial capita, order size and commission configurations. -
strategy_type_generator.py
Defines whether the strategy is long-only, short-only, or both usingPositionTypeEnum
. -
start_end_date_generator.py
Sets up historical date ranges for backtests. -
ticker_generator.py
Randomly selects from a pool of ticker symbols. Tickers are defined insp500.csv
.
prompt_data/
Submodule
-
condition_dicts.py
Provides mappings of condition types and corresponding indicators. -
string_options.py
Contains sets of predefined string values from which to randomly select to generate prompts. -
sp500.csv
)
A list of S&P 500 tickers used by theticker_generator
.
How It Works
The module is designed to be highly modular. You can:
- Use each generator individually
- Combine them via training_data.py
to create full training examples
- Set random seeds for reproducibility
What does it generate?
Vie the training_data.py
module, you can generate a dataset of prompt-response pairs. Every time you call the generate_trading_data()
function, it will create a new dataset with the specified number of examples. It always creates prompt that is a text in natural language and response to the prompt extracting from the prompt:
- Whole Strategy object
- Ticker
- Position Type
- Conditions
- Stop Loss
- Take Profit
- Start Date
- End Date
- Interval
- Period
- Initial Capital
- Order Size
- Trade Commissions
All of those datasets are saved as .jsonl
files and used for fine-tuning the models.