Building Pastebin on IPFS - with FastAPI, Svelte, and IPFS

Featured on Hashnode

IPFS stands for InterPlanetary File System. It is similar to the idea of torrents, but better. IPFS is a peer-to-peer hypermedia protocol designed to make the web faster, safer, and more open. I'm not going to nerd about IPFS more; just read the IPFS whitepaper.

I stumbled upon IPFS a couple of years ago and found it interesting. Back then, the only way to access IPFS was to spin up your own node(not sure, maybe lack of research). Today we have multiple free IPFS endpoints. We can use these endpoints to interact with the IPFS network.

The article is about storing text data on the IPFS network. This is something I've worked on the past few days, using the IPFS network to store data for free.

Tools used

  • FastAPI
  • MongoDB
  • Svelte
  • Infura IPFS endpoint

Why use a backend?

It is easy to make get/post requests to the IPFS endpoint using javascript's fetch API. But the problem is IPFS creates a hash for each file. This hash can be used for file identification.

QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX

But it is not an easy job to remember such hashes, so we need to store an alias to these hashes using a database.

FastAPI will regulate the whole program flow. We'll build APIs for communication between services.

Building the application

Setup env variables

# .env

MONGO_CON_STRING=mongodb://localhost:27017/

Setup mongodb

Let's use Docker to spin up MongoDB. Docker removes the overhead for a local installation and other basic setups.

# pull MongoDB
docker pull mongo

# Start mongo container
docker run -it -v mongodata:/data/db -p 27017:27017 --name ipfs-store -d mongo

-v mongodata:/data/db

-v is for specifying the volume. It is important to map MongoDB storage to the local directory to persist data even after the container is stopped. We map /data/db of container to mongodata of our project directory. Make sure the mongodata folder exists.

# requirements.txt

aiofiles==0.5.0
fastapi==0.61.1
ipfs-api==0.2.3
pymongo==3.11.0
sqlitedict==1.7.0
uvicorn==0.12.2

Code the database

We'll use pymongo to communicate with our database.

# database/database.py

class DataBase:
    def __init__(self) -> None:
        self.client = MongoClient(getenv("MONGO_CON_STRING"))
        self.db = self.client.pasteit
        self.col = self.db.links

    def set(self, short: str, hash: str) -> str:
        short_exists = self.col.find_one({"hash": hash})
        if short_exists is not None:
            return short_exists.get("short")
        data = {"short": short, "hash": hash}
        self.col.insert_one(data)
        return short

    def get(self, short: str) -> str:
        data = self.col.find_one({"short": short})
        if data is not None:
            return data.get("hash")
        return None

    def close(self) -> None:
        self.client.close()

Creating abstractions like this can make it easy to read code. I defined the set and get method with a series of pymongo operations to get the job done.

Every database insertion will be of this format,

{
    "short": "hash"
}

You can also use Redis here since we're making all insertions key: value based; I used MongoDB because this application is deployed on vercel with MongoDB atlas.

The code above is fairly simple. We create a get method to fetch a hash based on the short provided. We define the set method to store a short: hash pair. But first, we make sure the hash isn't already in the database.

Make the IPFS connection

# ipfs/ipfs.py

class IPFS:
    def __init__(self) -> None:
        self.ipfs = ipfsApi.Client("https://ipfs.infura.io", 5001)

    def add(self, text: str) -> str:
        filename = f"/tmp/{str(uuid4())}"
        with open(filename, "w") as f:
            f.write(text)
        res = self.ipfs.add(filename)
        remove(filename)
        print(res)
        return res[0].get("Hash")

    def cat(self, hash: str) -> str:
        data = self.ipfs.cat(hash)
        return data

Communications with the IPFS endpoint are simple get/post requests with payload, but you need to take care of the encoding. I used a library which has already done the basic things for us.

We define an add method, which writes the input string to a file and then uploads it to IPFS. The cat method reads the data using the hash.

Code the server

The server has two endpoints. /api/v1 to post the text to be uploaded and / to fetch data using short URLs.

# main.py

async def connection() -> dict:
    return {"db": DataBase(), "ipfs": IPFS()}


@app.post("/api/v1/")
async def pasteit(data: Data, con: dict = Depends(connection)) -> dict:
    hash = con["ipfs"].add(data.text)
    short = str(uuid4())[:6]
    short = con["db"].set(short, hash)
    con["db"].close()
    return {"message": short}


@app.get("/{short}")
async def get_paste(short: str, con: dict = Depends(connection)) -> dict:
    hash = con["db"].get(short)
    if hash is not None:
        data = con["ipfs"].cat(hash)
        return {"message": data}
    con["db"].close()
    return {"message": "invalid short"}

Here we assume that all data is successfully uploaded. Then we create a custom identifier for each hash using the first six characters of uuid.uuid4(). We need to perform a collision test on this method of short generation.

# collision_test.py

from uuid import uuid4


def get_id() -> str:
    return str(uuid4())[:6]


def test_n(n: int) -> None:
    outputs = [get_id() for _ in range(n)]
    unique_outputs = set(outputs)
    fraction = 1 - (len(unique_outputs) / len(outputs))
    print(f"Test for {n} shorts, collision: {fraction*100:.2f}")


if __name__ == "__main__":
    test_n(100)
    test_n(1000)
    test_n(10000)
    test_n(100000)
    test_n(1000000)
-> python collision_test.py
Test for 100 shorts, collision: 0.00
Test for 1000 shorts, collision: 0.00
Test for 10000 shorts, collision: 0.05
Test for 100000 shorts, collision: 0.26
Test for 1000000 shorts, collision: 2.93

-> python collision_test.py
Test for 100 shorts, collision: 0.00
Test for 1000 shorts, collision: 0.00
Test for 10000 shorts, collision: 0.01
Test for 100000 shorts, collision: 0.27
Test for 1000000 shorts, collision: 2.92

I guess the test passed, except for n=1,000,000, which got ~30,000 collisions. But it's safe to assume we're not going to get that many requests in a short span of time.

The frontend

# src/App.svelte

<script>
    let data = "";
    let hash = "";
    const upload = () => {
        fetch("http://localhost:8000/api/v1", {
            method: "POST",
            body: JSON.stringify({ text: data }),
        })
            .then((res) => res.json)
            .then((data) => (hash = data.message));
    };
</script>

<textarea id="data" bind:value={data} />
<button id="upload" on:click={upload}>Upload</button>
<p>{hash}</p>

This code should give you a fair idea of the frontend build. The current text limit is set to 200 characters.

What's next for pasteit!?

I'm planning to convert this into a file sharing service on IPFS. Maybe throw in a little encryption to make people interested!!

Demo

Check out the final application at pasteit.vercel.app

GitHub : github.com/amalshaji/pasteit

No Comments Yet