Feedback loop and tuning best practices

This article provides guidance for building an effective labeling and tuning practice. For an overview of how the feedback loop works and why each stage matters, see Why feedback and tuning matter.

1. Start with a clear objective

Before starting a labeling or tuning program, define what you want to improve. A clear objective helps determine which action types should be labeled first, which tuning controls to adjust, and which KPIs should be monitored. Examples include:

Reducing account takeover at login
Preventing fraudulent new account creation
Improving fraud prevention at transaction time
Reducing friction for legitimate users
Improving investigator efficiency

Label the action types you want to improve

The most useful labels are the ones tied directly to the business decision you want to optimize:

If the priority is detecting account takeover earlier, focus on login actions
If the priority is transaction fraud, focus on transaction actions
If the priority is new account opening protection, focus on registration actions

Prioritize labeling confirmed fraud

If your team resources are limited, begin with the confirmed fraud since labeling such events typically provides the highest value for tuning, especially during the early stages of deployment. As the program matures, expand to include high-confidence suspected fraud, confirmed legitimate outcomes, and broader coverage of deny and challenge decisions.

Measure legitimate friction, not only fraud capture

An effective feedback loop should improve both fraud prevention and customer experience. That means tracking not only what fraud was caught, but also which legitimate users were challenged or denied, whether trusted users are being recognized effectively, and whether investigation teams are reviewing the right cases. Confirmed legitimate labels are especially useful for improving trust recognition and reducing false positives.

Recommended practice

Align on your primary fraud use case, the relevant decision points, and the success metrics before expanding the feedback program.

Example
A financial institution wants to reduce account takeover at login. They define:
Primary objective: Reduce account takeover at login
Decision point: Login events
Success metrics: Fraud capture at login, challenge rate, false positives on legitimate users
Based on this, they prioritize labeling login actions (rather than transactions), focus on confirmed fraud and reviewed challenge/deny outcomes at login, and tune thresholds and challenge recommendations specifically for login flows. This focused approach improves early detection without unnecessarily increasing friction in other parts of the user journey.

2. Label outcomes as early as possible, once reliable

The earlier high-confidence feedback is provided, the more value it creates. At the same time, labels should only be submitted once there is enough confidence in the outcome.

Examples of reliable confirmation include:

Manual investigation results
Confirmed chargebacks
Verified findings from another fraud or risk system
Legitimate cases confidently cleared by investigation

Confirm fraud without requiring realized loss

A fraud outcome can be valid even if no monetary loss occurred—for example, fraud may have been prevented before funds were lost, or abuse may have been confirmed through investigation before a transaction completed. This is especially important in prevention-focused flows such as login, new account opening, verification, and early journey enforcement. Base fraud confirmation on confidence in the conclusion, not only on whether a realized loss occurred.

Validate early denies with alternative evidence

When Fraud Prevention denies an action early on, the fraud may never fully unfold, making direct validation more difficult. In these cases, organizations may need to rely on shadow mode or non-blocking evaluation during early rollout, retrospective analyst review, related downstream evidence, or portfolio-level performance analysis.

Recommended practice

Submit labels as soon as they are reliable enough to support tuning and performance analysis, and use a realistic validation framework for each stage of the user journey.

3. Use structured labeling data

To make feedback more actionable, labels should include as much useful structure as possible. Where applicable, include:

The correct label type
The appropriate subject level
The relevant fraud use case
The source of validation

This helps separate different fraud patterns and makes future analysis more meaningful.

Use label types consistently

Fraud Prevention supports multiple label types, each serving a different purpose. Use them intentionally:

Confirmed fraud for high-confidence fraud outcomes
Suspected fraud when fraud is likely but not yet fully confirmed
Confirmed legit when the case has been investigated and cleared
Undetermined when the outcome remains inconclusive

Consistent usage improves the quality and interpretability of the feedback loop. Define a clear internal labeling policy so teams apply labels consistently over time.

Choose the right subject level

Fraud Prevention supports labeling several types of subjects, including individual actions, correlated journeys, users, verification sessions, campaigns, and fraud rings. In many cases, the best starting points are:

Action-level labeling when feedback applies to a specific event
Correlation-level labeling when fraud spans multiple linked actions in the same journey or session

Use the narrowest subject level that accurately represents the fraud outcome while preserving the necessary context.

Correlate fraud across the user journey when possible

Fraud often spans more than one event. A single fraud case may involve multiple connected actions, devices, or sessions. Where possible, connect feedback to the broader journey using relevant identifiers such as correlation ID, claimed user ID, etc. When fraud affects multiple related actions, use correlated labeling instead of only labeling the final event.

Use reliable validation sources

The source of the label helps explain how the outcome was determined. Common sources include manual review, customer complaints, chargebacks, and other vendors or fraud systems. Tracking the source improves transparency and supports future analysis.

Recommended practice

Structure labels deliberately so they reflect both the outcome and the fraud scenario, and use the source field consistently so the feedback loop reflects how each outcome was validated.

4. Tune detection sensitivity based on observed patterns

Detection sensitivity lets you adjust how much weight each risk factor carries in the risk score calculation. Use it to align the system's scoring with your actual threat landscape.

Adjust sensitivity to business needs

Each factor can be set to one of five levels:

Ignore: Removes the factor from score calculation entirely.
Low: Reduces the factor's contribution to the score.
Default: Uses the system's baseline weighting.
High: Amplifies the factor's contribution, making it more likely to trigger a Challenge or Deny.
Deny: Automatically blocks any action where this factor is detected, regardless of other signals.

Sensitivity adjustments are most effective when:

A specific risk factor generates too many false positives (e.g., VPN usage is common among your legitimate users—consider lowering the is VPN weight).
A new attack pattern emerges that requires immediate response (e.g., a surge in bot activity—set the Bot factor to Deny to block all bot-flagged actions until a proper strategy is defined).
Certain factors are not relevant to your environment or business (e.g., corporate users are using virtual machines by policy—set the Virtual machine factor to Ignore).

Apply sensitivity changes safely

Use the Preview mode first: Save sensitivity configurations as a preview policy and evaluate their impact on the Recommendations dashboard before deploying to production. This avoids unintended disruption.
Adjust incrementally: Change one or two factors at a time so you can isolate the effect of each adjustment on your recommendation distribution.
Review after labeling milestones: When a significant batch of labels has been submitted, revisit sensitivity settings to see whether the system's baseline has shifted enough to warrant recalibration.
Combine with rules for layered control: Sensitivity adjusts the score broadly; rules override the recommendation for specific conditions. Use both together—for example, increase sensitivity for public Wi-Fi signals generally, but also create a rule that denies transactions from public Wi-Fi combined with a new device.

Recommended practice

Adjust sensitivity incrementally, always validate in Preview mode before production, and revisit settings as your label volume grows and fraud patterns shift.

5. Limit rules to business logic

Rules let you define fixed recommendation outcomes when specific conditions are met, overriding the system's ML-based recommendation. Unlike labels and detection sensitivity, rules do not train or refine the recommendation engine—they apply a static override.

For adjusting how the system learns and scores risk over time, labels are a more effective mechanism. Labels refine the ML models so that future recommendations improve organically, whereas rules simply force a fixed outcome for matching conditions.

Choose the right scenario for rules

Rules are best suited for two scenarios:

Mitigating immediate threats: When a new attack vector is identified, a rule can be deployed immediately to block it (e.g., deny all actions from IP ranges confirmed as malicious). Once the threat is contained and enough labels have been submitted, the ML models adapt on their own and the rule can be retired.
Permanent business policies: For strict, well-defined conditions that should always produce the same outcome regardless of risk score—such as allowlisting corporate devices or always challenging transactions above a certain threshold.

Apply rules effectively

Start in the Preview mode: Every new rule should be tested in Preview mode first. Preview mode simulates the rule's effect against live traffic without impacting production decisions, so you can evaluate its reach before going live.
Keep rules focused and minimal: Each rule should address a single, well-defined scenario. Overly broad rules can override the system's ML intelligence in ways that increase false positives or reduce fraud capture.
Manage rule priority carefully: Rules are evaluated top to bottom, and only the first match applies. Place the most critical rules (e.g., deny rules for confirmed threats) higher in the priority order, and more permissive rules (e.g., trust rules for known-good devices) lower.
Audit rules regularly: As fraud patterns evolve and your label history grows, rules created during early deployment may become redundant or counterproductive. Review active rules during your periodic tuning reviews and disable or remove rules that the ML models now handle effectively on their own.
Coordinate rules with sensitivity settings: If you increase the sensitivity of a factor to High, you may not also need a rule that denies actions based on the same factor. Avoid duplicating logic across both controls.

Recommended practice

Reserve rules for immediate threat mitigation and permanent business policies. For long-term improvement of fraud detection accuracy, rely on labels to train the recommendation engine.

6. Distinguish between initial tuning and ongoing optimization

Distinguish between diffrent stages of tuning. Tuning should also be revisited whenever the environment changes in a meaningful way—such as new fraud patterns, performance degradation, new channels or use cases, changes in enforcement strategy, or shifts in business priorities.

Initial tuning

Initial tuning is the early phase of aligning Fraud Prevention to a new environment, channel, or use case. Typical goals include:

Establishing a performance baseline
Understanding the organization's fraud patterns
Reviewing early recommendation quality
Aligning on operational processes
Calibrating detection sensitivity and creating initial rules

This phase may take place during a proof of concept, proof of value, or early production rollout. During initial tuning, lean on rules and sensitivity adjustments for quick wins while you build up the label volume that improves ML accuracy over time.

Ongoing optimization

Ongoing optimization is the continuous improvement phase once a baseline is in place. Typical goals include:

Adapting to new fraud patterns
Improving precision and fraud capture over time
Monitoring performance trends
Revisiting strategy when business conditions change
Retiring rules that are no longer needed as the ML models mature

Recommended practice

Treat initial tuning as a setup phase and ongoing optimization as a continuous operating process. Labeling and tuning are long-term capabilities, not one-time setup tasks.

7. Establish a regular review cadence

Establish a review schedule. A practical approach may include:

Continuous submission where labeling is automated
Weekly operational reviews to assess recently confirmed fraud, review deny and challenge performance, detect changes in fraud patterns, and evaluate whether rules or sensitivity settings need adjustment
Periodic business-level performance reviews to align on tuning decisions, review rule effectiveness, and maintain process consistency

Regular reviews help teams detect changes in fraud patterns, align on tuning decisions, and maintain process consistency.

Recommended practice

Choose a review cadence that your organization can sustain consistently over time.

8. Automate labeling wherever possible

Manual labeling in the Admin Portal is handy during initial integration phase, but it can become difficult to scale. As your program matures, move toward integrating feedback directly into fraud workflows through the Send label API.

Examples of systems to integrate with:

Case management systems
Investigation tools
Chargeback workflows
Complaint resolution systems
Other fraud and risk platforms

Recommended practice

Incorporate API-driven labeling into operational workflows to improve speed, consistency, and scale.

Recommended operating model

No two organizations have the same fraud team, tooling, or operating model. The right approach depends on your organization's maturity. Start with a process your team can sustain, then expand it over time.

Minimum viable model

A good starting model includes:

Defining the primary fraud use case
Labeling confirmed fraud first
Using action-level or correlation-level feedback
Using manual review as the main source
Labeling a manageable sample if full coverage is not feasible
Adjusting detection sensitivity for the most impactful risk factors
Creating a small set of rules for known threat patterns, tested in Preview mode first
Establishing a recurring review cadence
Aligning with Mosaic on early tuning decisions

Mature model

A more advanced model includes:

Automating label submission via the Send label API
Correlating feedback across journeys and sessions
Including confirmed legitimate outcomes
Classifying by fraud scenario and source
Maintaining a curated rule set that complements ML-driven recommendations, with regular audits to retire redundant rules
Coordinating sensitivity settings with rules and label trends
Integrating with existing fraud operations
Revisiting tuning based on KPI movement