Skip to main content

Docling

What is Docling?

  • docling is a python library designed by IBM that converts documents into structured data
  • simplifies downstream document and AI processing by detecting tables, formulas, reading order, OCR

Install Docling

# Create a folder for your project and move into it
mkdir pdf-to-markdown && cd pdf-to-markdown

# Create the virtual environment
/opt/homebrew/bin/python3.12 -m venv venv

# Activate it
source venv/bin/activate

# Install Dolcing
pip install docling

Use Docling

touch convert.py
nano convert.py
  • Paste the follwing inside the file
from docling.document_converter import DocumentConverter

# 1. Initialize the converter
converter = DocumentConverter()

# 2. Specify your PDF file path (or use a URL!)
source = "your_document.pdf"

print("Converting document... Please wait.")

# 3. Convert the document
result = converter.convert(source)

# 4. Extract the markdown string
markdown_output = result.document.export_to_markdown()

# 5. Save the markdown to a file
output_file = "output.md"
with open(output_file, "w", encoding="utf-8") as f:
f.write(markdown_output)

print(f"Success! Saved markdown to {output_file}")
  • put a pdf in the same folder named: your_document.pdf
python3 convert.py

Run Docling in Docker

Step 1: Create a requirements.txt File

streamlit
docling

Step 2: Create a Dockerfile

# Use a pre-bundled data science image that includes all Linux graphics/GL drivers out-of-the-box
FROM jupyter/scipy-notebook:latest

# Switch to root to handle file permissions and setup
USER root
WORKDIR /app

# Copy requirements and install python packages
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy your Streamlit app script
COPY app.py .

# Expose port
EXPOSE 8501

# Run streamlit
ENTRYPOINT ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]

Step 3: Build the Docker Image

docker build --no-cache -t local-docling-app .

Step 4: Run the Container "Always"

# Clear old container tags
docker stop docling-web || true
docker rm docling-web || true

# Launch the updated app
docker run -d --name docling-web -p 8501:8501 --restart unless-stopped local-docling-app