Editorial methodology

How we test & score every NSFW AI tool

Every tool in the SpicyTools catalog is evaluated against the same 5-criteria rubric, applied uniformly. The final score is a weighted average out of 5, refreshed quarterly. This page documents the weights, what each criterion measures, and the guardrails that prevent affiliate relationships from moving scores.

The five-criteria rubric

01 — Content quality
30%
What the tool actually produces — its core output.
What it evaluates
- Output realism and fidelity (chat coherence, image realism, video motion, voice naturalness)
- Prompt adherence on edge-case concepts, not just showroom demos
- Consistency across regenerations — identical prompt, identical quality
- Style range (photorealistic, anime, artistic, hybrid)
- Depth of character memory (for chat) / resolution ceiling (for image/video)
02 — User experience
25%
How painful the tool is to use on day one and day thirty.
What it evaluates
- Onboarding friction — how many steps from signup to first generation
- Latency — text response time, image generation time, video queue depth
- Interface clarity and absence of dark patterns
- Mobile parity — does the mobile experience match desktop feature-wise
- Search, filters, and history/gallery management
03 — Privacy
20%
What happens to your data, in practice — not just what the policy claims.
What it evaluates
- Conversation / generation retention policy (default, configurable, enforceable)
- Whether user content is used to train future models
- Account deletion — and whether it actually purges generated media
- Anonymous payment options (crypto, prepaid cards, gift cards)
- Third-party data sharing disclosure clarity
- External incidents or privacy red flags in the last 24 months
04 — Pricing transparency
15%
Whether the advertised price matches checkout — and whether the free tier is honest.
What it evaluates
- Gap between advertised price and checkout total
- Credit / token economics when the platform uses them
- Refund policy clarity and execution
- Annual-vs-monthly honesty (no surprise upfront-only annual)
- Free tier — is it genuinely testable or a frustration funnel
05 — Reliability
10%
Whether the tool works when you need it.
What it evaluates
- Observed uptime during testing (weekday peak hours, weekend evenings)
- Generation consistency on identical prompts across days
- Support responsiveness when the tool breaks (ticket → first reply)
- Version stability — breaking changes shipped without notice count against

Testing protocol

Every tool is paid for with our own funds. First evaluation is a minimum of 2 hours of hands-on testing across a standardised set of prompts covering short-form and long-form chat (for conversational tools), photorealistic and stylised generation (for image tools), and edge-case NSFW concepts the platform advertises support for.

Each tool is tested on both the free tier and a paid tier — because free tiers often cap features the paid advertising doesn't disclose. When a platform has multiple paid tiers, we test the cheapest paid tier that unlocks the advertised feature set.

Scores are re-evaluated every quarter, or immediately when a significant product update, pricing change, or privacy incident is reported. Every tool page displays the last reviewed date near the score so readers can assess freshness.

Affiliate relationships & conflict of interest

SpicyTools monetises through affiliate links. We're explicit about this because every review platform in this space does it — what separates honest outfits from pay-for-play is whether the affiliate relationship can move the score. Here it cannot:

Scoring is completed before any affiliate relationship is negotiated.
Once a tool is listed, commercial terms never retroactively alter its score.
Affiliate links are flagged on the tool page and at the footer.
We list tools we do not have an affiliate relationship with whenever they are objectively strong — missing coverage is worse for readers than missing revenue.

Spot a conflict or a questionable score? Tell us at the editorial inbox. Corrections are published with a visible change log on the affected tool page.

What we do not list

We do not review tools that allow, encourage, or fail to guard against non-consensual content or CSAM — regardless of technical quality. We also do not review tools that operate without any discernible age gate. Inclusion signals that a tool at minimum clears this floor; it does not mean we endorse its business model.

Adult Content Warning - 18+ Only

The five-criteria rubric

01 — Content quality

02 — User experience

03 — Privacy

04 — Pricing transparency

05 — Reliability

Testing protocol

Affiliate relationships & conflict of interest

What we do not list