Skip to content

unisound-doc-parser

Parse PDF, DOC, DOCX, and image files to Markdown or JSON using UniDoc API with sync or async mode and automatic status polling.

41 downloads
Free
Reviewed

Run from the skill directory

python scripts/unidoc_parse.py /path/to/file.pdf
--format md
--output ./unidoc-output
--mode sync


Options
-------
* `--format md|json` (default: `md`)
  - Output format: Markdown or JSON
* `--mode sync|async` (default: `sync`)
  - Synchronous mode: waits for conversion to complete
  - Asynchronous mode: polls status until completion
* `--func METHOD` (default: `unisound`)
  - Conversion method/algorithm to use
* `--output DIR` (default: `./unidoc-output`)
  - Output directory for converted files
- 
* `--uid UUID` (optional)
  - Custom user ID (auto-generated if not provided)

Output conventions
------------------
* Creates `./unidoc-output/<document_name>/` by default
* Markdown output: `output.md`
* JSON output: `output.json`
* Output filename preserves original document name

Notes
-----
* Requires network connectivity to UniDoc API (http://unidoc.uat.hivoice.cn)
* Supports multiple file formats: PDF, DOC, DOCX, PNG, JPG, etc.
* Async mode polls every 1 second until completion
* Max file size and rate limits depend on API service configuration
* For large files or batch processing, prefer async mode

Download

ZIP package — ready to use

Skill Info

Creator
aaiccee
Downloads
41
Published
Mar 15, 2026
Updated
Mar 16, 2026