text cleaning APIHTML stripper APItext normalizationremove HTML tags programmaticallyclean scraped text API

Text Cleaner API

Remove HTML, fix encoding, and normalize raw text in one API call.

Clean and normalize raw text automatically: strip HTML tags and entities, remove markdown syntax, fix encoding issues (smart quotes, em-dashes, broken UTF-8), and collapse whitespace. Returns the cleaned text alongside a diff summary of changes applied and a character reduction percentage. Ideal for preprocessing scraped content, user-generated text, or documents before indexing or analysis.

POST/api/trpc/textCleaner.clean
5 tokens / callAPI online

How it works

Three steps. No complex setup.

1

Send the raw text

Pass the raw string in the `text` field. Enable or disable operations with boolean flags: `remove_html`, `remove_markdown`, `fix_encoding`, `normalize_whitespace`. All are enabled by default.

2

Pipeline processing

The service applies operations in order: first strips HTML tags and entities, then fixes encoding (smart quotes, em-dashes, broken UTF-8), and finally collapses multiple spaces and line breaks.

3

Receive clean text with metrics

The response includes the cleaned text, original and final length, reduction percentage, and the list of transformations applied for full auditability.

Who is it for?

Web scrapers and crawlers
Content pipelines
NLP preprocessing
Document indexing
Data cleaning workflows

Response example

Real input and output. What you send and what you get back.

// Input

{
  "text": "<p>Hello&nbsp;<strong>world</strong>!   This is   a   test. </p>",
  "remove_html": true,
  "fix_encoding": true,
  "normalize_whitespace": true
}

// Output

{
  "cleaned_text": "Hello world! This is a test.",
  "original_length": 64,
  "cleaned_length": 27,
  "reduction_percent": 58,
  "changes_applied": [
    "removed HTML tags",
    "fixed encoding",
    "normalized whitespace"
  ]
}

Try it now

// live demo — no account needed to try

Playground

Demo activa5 tokens

Integrate into your project

Copy and paste. Replace YOUR_API_KEY with your real key.

curl -X POST https://jsnhengine.com/api/trpc/textCleaner.clean \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{"json": {"text":"<p>Hello&nbsp;<strong>world</strong>!   This is   a   test. </p>","remove_html":true,"fix_encoding":true,"normalize_whitespace":true}}'

Why choose JSNH Engine Lab

Low latency

Responses in under 200ms

Secure authentication

API keys with per-plan rate limiting

Usage tracking

Every request logged with metrics

Production ready

Input validation and typed errors

Frequently asked questions

Everything you need to know before integrating.

Other modules

Expand your integration with more APIs from the catalogue.

// production ready

Start using Text Cleaner API

Sign up free and get 1,000 tokens to start. No credit card. No complex setup.