feat: structured-output Portfolio Manager + 5-tier rating consistency (#434)

Three related changes that take the rating pipeline from heuristic-only
to type-safe at the source.

1) Research Manager prompt now uses the same 5-tier scale (Buy /
   Overweight / Hold / Underweight / Sell) as the Portfolio Manager,
   signal_processing, and the memory log.  The prior 3-tier wording
   (Buy / Sell / Hold) was the only remaining inconsistency in the
   pipeline.

2) Centralise the 5-tier vocabulary and the heuristic prose-rating
   parser into tradingagents/agents/utils/rating.py.  Both the memory
   log and the signal processor now share the same parser instead of
   duplicating regex and word-walker logic.

3) Make structured output a first-class part of the Portfolio Manager's
   primary call.  The PM uses llm.with_structured_output(PortfolioDecision)
   so each provider's native structured-output mode (json_schema for
   OpenAI/xAI, response_schema for Gemini, tool-use for Anthropic,
   function_calling for OpenAI-compatible providers) yields a typed
   Pydantic instance directly.  A render helper turns that instance back
   into the same markdown shape downstream consumers (memory log, CLI
   display, saved reports) already expect, so no other code has to know
   the PM now produces structured output.  Providers without structured
   support fall back gracefully to free-text + the deterministic
   heuristic.

   The previous SignalProcessor had been making a second LLM call to
   re-extract the rating from the PM's prose; that round-trip is now
   eliminated.  SignalProcessor is a thin adapter over parse_rating(),
   makes zero LLM calls, and stays for backwards compatibility with
   process_signal() callers.

Schema (PortfolioDecision) captures rating + executive_summary +
investment_thesis + optional price_target + time_horizon, with field
descriptions doubling as output instructions.  Agent prose remains the
primary artifact; structured output is layered onto the PM only because
it is the one agent whose output has machine-readable downstream
consumers.

15 new tests cover the heuristic parser (markdown-bold edge cases that
had no coverage before), the structured PM happy path, the free-text
fallback path, and that SignalProcessor never invokes the LLM.  Full
suite: 77 tests pass in ~2s without API keys.
This commit is contained in:
Yijia-Xiao
2026-04-25 19:57:26 +00:00
parent 4cbd4b086f
commit 0fda24515f
8 changed files with 399 additions and 87 deletions

View File

@@ -0,0 +1,90 @@
"""Tests for the shared rating heuristic and the SignalProcessor adapter.
The Portfolio Manager produces a typed PortfolioDecision via structured
output and renders it to markdown that always contains a ``**Rating**: X``
header. The deterministic heuristic in ``tradingagents.agents.utils.rating``
is therefore sufficient to extract the rating downstream — no second LLM
call is needed — and SignalProcessor is now a thin adapter that delegates
to it.
"""
import pytest
from tradingagents.agents.utils.rating import RATINGS_5_TIER, parse_rating
from tradingagents.graph.signal_processing import SignalProcessor
# ---------------------------------------------------------------------------
# Heuristic parser
# ---------------------------------------------------------------------------
@pytest.mark.unit
class TestParseRating:
def test_explicit_label_buy(self):
assert parse_rating("Rating: Buy\nReasoning here.") == "Buy"
def test_explicit_label_overweight(self):
assert parse_rating("Rating: Overweight\nDetails.") == "Overweight"
def test_explicit_label_with_markdown_bold_value(self):
# Regression: Rating: **Sell** — markdown around the value.
assert parse_rating("Rating: **Sell**\nExit immediately.") == "Sell"
def test_explicit_label_with_markdown_bold_label(self):
assert parse_rating("**Rating**: Underweight\nTrim exposure.") == "Underweight"
def test_rendered_pm_markdown_shape(self):
# The exact shape produced by render_pm_decision must always parse.
text = (
"**Rating**: Buy\n\n"
"**Executive Summary**: Enter at $189-192, 6% portfolio cap.\n\n"
"**Investment Thesis**: AI capex cycle intact; institutional flows constructive."
)
assert parse_rating(text) == "Buy"
def test_explicit_label_wins_over_prose_with_markdown(self):
text = (
"The buy thesis is weakened by guidance.\n"
"Rating: **Sell**\n"
"Exit before earnings."
)
assert parse_rating(text) == "Sell"
def test_no_rating_returns_default(self):
assert parse_rating("No clear directional signal at this time.") == "Hold"
def test_no_rating_custom_default(self):
assert parse_rating("Plain prose.", default="Underweight") == "Underweight"
def test_all_five_tiers_recognised(self):
for r in RATINGS_5_TIER:
assert parse_rating(f"Rating: {r}") == r
# ---------------------------------------------------------------------------
# SignalProcessor: thin adapter over the heuristic
# ---------------------------------------------------------------------------
@pytest.mark.unit
class TestSignalProcessor:
def test_returns_rating_from_pm_markdown(self):
sp = SignalProcessor()
md = "**Rating**: Overweight\n\n**Executive Summary**: Build gradually."
assert sp.process_signal(md) == "Overweight"
def test_makes_no_llm_calls(self):
"""SignalProcessor must not invoke the LLM it was constructed with —
the rating is parseable from the rendered PM markdown directly."""
from unittest.mock import MagicMock
llm = MagicMock()
sp = SignalProcessor(llm)
sp.process_signal("Rating: Buy\nDetails.")
llm.invoke.assert_not_called()
llm.with_structured_output.assert_not_called()
def test_default_when_no_rating_present(self):
sp = SignalProcessor()
assert sp.process_signal("Plain prose without a recommendation.") == "Hold"