Editorial methodology
How we test & score every NSFW AI tool
Every tool in the SpicyTools catalog is evaluated against the same 5-criteria rubric, applied uniformly. The final score is a weighted average out of 5, refreshed quarterly. This page documents the weights, what each criterion measures, and the guardrails that prevent affiliate relationships from moving scores.
The five-criteria rubric
01 — Content quality
30%What the tool actually produces — its core output.
What it evaluates
- Output realism and fidelity (chat coherence, image realism, video motion, voice naturalness)
- Prompt adherence on edge-case concepts, not just showroom demos
- Consistency across regenerations — identical prompt, identical quality
- Style range (photorealistic, anime, artistic, hybrid)
- Depth of character memory (for chat) / resolution ceiling (for image/video)
02 — User experience
25%How painful the tool is to use on day one and day thirty.
What it evaluates
- Onboarding friction — how many steps from signup to first generation
- Latency — text response time, image generation time, video queue depth
- Interface clarity and absence of dark patterns
- Mobile parity — does the mobile experience match desktop feature-wise
- Search, filters, and history/gallery management
03 — Privacy
20%What happens to your data, in practice — not just what the policy claims.
What it evaluates
- Conversation / generation retention policy (default, configurable, enforceable)
- Whether user content is used to train future models
- Account deletion — and whether it actually purges generated media
- Anonymous payment options (crypto, prepaid cards, gift cards)
- Third-party data sharing disclosure clarity
- External incidents or privacy red flags in the last 24 months
04 — Pricing transparency
15%Whether the advertised price matches checkout — and whether the free tier is honest.
What it evaluates
- Gap between advertised price and checkout total
- Credit / token economics when the platform uses them
- Refund policy clarity and execution
- Annual-vs-monthly honesty (no surprise upfront-only annual)
- Free tier — is it genuinely testable or a frustration funnel
05 — Reliability
10%Whether the tool works when you need it.
What it evaluates
- Observed uptime during testing (weekday peak hours, weekend evenings)
- Generation consistency on identical prompts across days
- Support responsiveness when the tool breaks (ticket → first reply)
- Version stability — breaking changes shipped without notice count against
Testing protocol
Every tool is paid for with our own funds. First evaluation is a minimum of 2 hours of hands-on testing across a standardised set of prompts covering short-form and long-form chat (for conversational tools), photorealistic and stylised generation (for image tools), and edge-case NSFW concepts the platform advertises support for.
Each tool is tested on both the free tier and a paid tier — because free tiers often cap features the paid advertising doesn't disclose. When a platform has multiple paid tiers, we test the cheapest paid tier that unlocks the advertised feature set.
Scores are re-evaluated every quarter, or immediately when a significant product update, pricing change, or privacy incident is reported. Every tool page displays the last reviewed date near the score so readers can assess freshness.
Affiliate relationships & conflict of interest
SpicyTools monetises through affiliate links. We're explicit about this because every review platform in this space does it — what separates honest outfits from pay-for-play is whether the affiliate relationship can move the score. Here it cannot:
- Scoring is completed before any affiliate relationship is negotiated.
- Once a tool is listed, commercial terms never retroactively alter its score.
- Affiliate links are flagged on the tool page and at the footer.
- We list tools we do not have an affiliate relationship with whenever they are objectively strong — missing coverage is worse for readers than missing revenue.
Spot a conflict or a questionable score? Tell us at the editorial inbox. Corrections are published with a visible change log on the affected tool page.
What we do not list
We do not review tools that allow, encourage, or fail to guard against non-consensual content or CSAM — regardless of technical quality. We also do not review tools that operate without any discernible age gate. Inclusion signals that a tool at minimum clears this floor; it does not mean we endorse its business model.