Diego Lourenço

Posted on Jan 15, 2021

How I created and published my first Python package on PyPI

#python #beginners #tutorial

Hello there!

In this post (my first one!) I am going to explain how I created a simple Python package and published it on PyPI. I started it as part of my final project for CS50x course and decided to share the steps to reach the end.
The repository in GitHub with all the files discussed is here and the published package is here.

📘 🐍 Skoobpy

Some context is important to begin. Skoob is a social network focused on books very popular in Brazil and it is similar to goodreads. There it is possible to save books on different bookshelves, such as read, currently reading, desired ones, and so on.

As a user, I always wanted a way to be able to get these books data to use it for some purpose. For example, if I am paying attention to some book sales I have to browse for many pages in the site to see all the books that I saved on my desired bookshelf as the site does not have an API.

In this context, the package returns all the data, as the title, author, publisher, page numbers, from the desired books in a CSV file for a specific user.

How it works

skoobpy can be run in a command-line followed by an user_id. The data will be stored in a CSV file named books_user_id.csv.

$ python skoobpy <user_id>

Or it could be imported into a python file to use the data in other ways.

import skoobpy
from skoobpy import *

Building the package

Creating a Virtual Environment

In order to prevent future issues because of running into dependency issues due to changes that I may use in the project I created a virtual environment. For instance, if I use some version of the package request and in a future update they modified something, some part of my code that works just fine could just stop working. Also, if I am collaborating with someone else in the project, it is a great idea to be sure that everyone is working in the same environment.

First, I run the command to install virtualenv:

$ pip install virtualenv

Inside a folder called skoobpy I run the command below. This creates a folder called venv.

$ virtualenv venv

Now it is necessary to activate the environment. There is a difference depends on what operating system you are using here.

For Windows, while using the WSL (Windows Subsystem for Linux) you should run the first command below (and if you are a beginner as I am read this). If you are not using WSL, run the second one:

$ source ./venv/Scripts/activate
(venv) $

$ \pathto\venv\Scripts\activate
(venv) $

For linux you should run:

$ source ./venv/bin/activate
(venv) $

After this, the prompt will be prefixed with the name of the environment (venv) as showed below. This indicates that venv is currently active and python executable will only use this environment’s packages. To deactivate an environment simply run deactivate.

(venv) $ deactivate
$

Finally, I installed here all the dependencies that are going to be necessary to build the package. They are wheel, setuptools, twine, requests and to perform some tests pytest. I put all the names that I need in a file called requirements.txt to install everything at once and then my environment is ready for work.

$ pip install -r requirements.txt

Looking at the source code

Now that I have presented the idea, I am going to show how I did it. To begin, let's take a look at the directory structure of skoobpy:

skoobpy/
│
├─ skoobpy/
│   ├── __init__.py
│   ├── __main__.py
│   └── skoobpy.py
│
├── tests/
│   └── test_skoobpy.py
│
├── venv/
│
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py

In this section, I will show the details of the code file by file. All the files can be seen in the GitHub repository.

📂 skoobpy/

Besides the setup.py, there is still the files LICENSE that I take the MIT default one for open-source projects and the README.md that documents the package.

The setup.py contains all the information that is important to PyPI. Here we define every aspect of the package, let's see some of them:

name defines the actual name that will appear at the time to install the package.
In package you can define what is going to be include or exclude from your package. I included only skoobpy to avoid the folder tests.
version shows the actual version of your package. A good source to understand the semantic of the version is looking at this.
description presents a short description of what the package does.
In long_description it is possible to give a better description of the functionalities of the package. Here I simply used the content in README.md.
The long_description_content_type makes it possible to use a markdown file as the long description.
author and author_email are important if you want to let people contact you about the package.
url presents where to find more information about it. Usually the repository.
install_requires shows which other packages are mandatory to use this one. It is not necessary to list packages that are part of the standard Python library.
classifiers are important to make it easy to find the package on the PyPI site.

from setuptools import find_packages, setup

with open('README.md', 'r', encoding='utf-8') as file:
    long_description = file.read()

setup(
    name                ='skoobpy',
    packages            =find_packages(include=['skoobpy']),
    version             =__version__,
    description         ='extracts user\'s desired books from Skoob.com.br',
    long_description    = long_description,
    long_description_content_type='text/markdown',
    author              ='Diego Lourenço',
    author_email        ='diego.lourenco15@gmail.com',
    license             ='MIT',
    url                 ='https://github.com/Diegoslourenco/skoobpy',
    platforms           =['Any'],
    py_modules          =['skoobpy'],
    install_requires    =[],
    classifiers         =[
        'Development Status :: 3 - Alpha',
        'License :: OSI Approved :: MIT License',
        'Operating System :: OS Independent',
        'Programming Language :: Python',
    ],
)

📂 skoobpy/skoobpy/

init.py

This file represents the root of the package. Could be left empty, but I put the variable __version__ inside it to track the version in the future.

# __init__.py
__version__ = '0.1.3'

main.py

Briefly, this is the entry part of the program and has the responsibility to call others as needed. There are two imports here.
First, we have to import the argv from sys as it is taking the second argument (argv[1]) from the command line as the user_id.
In the other import, we take all the content in the file skoobpy that we are going to see in detail soon.

# __main__.py
from skoobpy import *

def main():
    from sys import argv
    user_id = argv[1]

    books_json = get_all_books(user_id)
    books_desired = filter_desired_books(books_json)
    export_csv(books_desired, user_id)

if __name__ == "__main__":
    main()

skoobpy.py

This is the file that does all the work. It imports requests to make the request to the site, the json to get the data from the site in a format that it is possible to work and csv to export what we want.

There are three functions defined here: get_all_books, filter_desired_books and export_csv.

The get_all_books compose an url using the url_base skoob.com.br and the user_id number.
Depends on the number of books saved by the user, it results in many pages on the site. For this reason, it is necessary to get the total_books that represents the total number of books. The total_books_url represents the final URL to request.
Finally, a request to the total_books_url is made and the result is parsed as an object JSON is saved in the variable books_json and that is what the function return. Now we have all the book data from a user from skoob.
filter_desired_books receives the data in a JSON and to take only the desired books, it has to check if the book field desejado (desired in portuguese) is equal to 1. In a positive case, it saves the data from the book in a list. If the value is equal to zero, it means that this book is not desired. It returns the list books populated with the desired ones.
export_csv defines in the header the first row for the CSV file. After this, using the header and the books_list it opens a CSV file named books_{user_id} saving each element of the list corresponding to a row.

# skoobpy.py
import requests
import json
import csv

url_base = 'https://www.skoob.com.br'

def get_all_books(user_id):
    url = f'{url_base}/v1/bookcase/books/{user_id}'
    print(f'request to {url}')

    user = requests.get(url)
    total = user.json().get('paging').get('total')
    total_books = f'{url}/shelf_id:0/page:1/limit:{total}'

    books_json = requests.get(total_books).json().get('response')

    return books_json

def filter_desired_books(books_json):
    books = []

    for book in books_json:
        if book['desejado'] == 1:
            ed = book['edicao']

            # if there is a subtitle, it must be concatenate to title
            if ed['subtitulo'] != '':
                book_title = str(ed['titulo']) + ' - '+ str(ed['subtitulo'])
            else:
                book_title = ed['titulo']

            book_url = url_base + ed['url']
            book_data = [book_title, ed['autor'], ed['ano'], ed['paginas'], ed['editora'], book_url]
            books.append(book_data)

    return books


def export_csv(books_list, user_id):

    header = ['Title', 'Author', 'Published Year', 'Pages', 'Publisher', 'Skoob\'s Page']

    with open(f'books_{user_id}.csv', 'w', encoding='utf-8', newline='') as csvfile:
        data = csv.writer(csvfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL)
        data.writerow(header)

        for book in books_list:
            data.writerow(book)

    return

📂 skoobpy/tests/

test_skoobpy.py

There are a couple of tests here to verify if the functions are taking the correct data before export to the CSV file from a specific user (my own user in this case).

# test_skoobpy.py
# Tests for the skoobpy module

# standard import
import csv

# third party import
import pytest

# skoobpy import
from skoobpy import *

@pytest.fixture
def total_books():
    user_id = 1380619
    return get_all_books(user_id)

@pytest.fixture
def total_desired_books():
    user_id = 1380619
    all_books = get_all_books(user_id)
    return filter_desired_books(all_books)

# Tests
def test_total_books(total_books):
    assert len(total_books) == 619

def test_total_desired_books(total_desired_books):
    assert len(total_desired_books) == 466

Building the library

After all, the content is ready and everything is working well, it is time to build the package running:

python setup.py sdist bdist_wheel

This will create a new folder dist with two files.

The sdist creates the source distribution (skoobpy-0.1.3.tar.gz).
The bdist_wheel creates the wheel file to install the package (skoobpy-0.1.3-py3-none-any.whl)

skoobpy/
│
└── dist/
    ├── skoobpy-0.1.3-py3-none-any.whl
    └── skoobpy-0.1.3.tar.gz

Checking for errors

The first step is to look inside the skoobpy-0.1.3.tar.gz and see if everything is here, running the command below. The new files are created based on the information provided in the setup.py.

$ tar tzf ./dist/skoobpy-0.1.3.tar.gz
skoobpy-0.1.3/
skoobpy-0.1.3/PKG-INFO
skoobpy-0.1.3/README.md
skoobpy-0.1.3/setup.cfg
skoobpy-0.1.3/setup.py
skoobpy-0.1.3/skoobpy/
skoobpy-0.1.3/skoobpy/__init__.py
skoobpy-0.1.3/skoobpy/__main__.py
skoobpy-0.1.3/skoobpy/skoobpy.py
skoobpy-0.1.3/skoobpy.egg-info/
skoobpy-0.1.3/skoobpy.egg-info/PKG-INFO
skoobpy-0.1.3/skoobpy.egg-info/SOURCES.txt
skoobpy-0.1.3/skoobpy.egg-info/dependency_links.txt
skoobpy-0.1.3/skoobpy.egg-info/top_level.txt

Using twine to check if the distribution will render correctly on PyPI is another way to verify if everything is going as planned.

$ twine check dist/*
Checking dist/skoobpy-0.1.3-py3-none-any.whl: PASSED
Checking dist/skoobpy-0.1.3.tar.gz: PASSED

The final check could be performed by uploading the package to TestPyPI. This will confirm if the package is going to show the information on the site and execute as it should be. It is mandatory to have an account as the twine will ask for a username and password. After the upload, it is possible to go to TestPyPI, see the package there, and install it to test.

$ twine upload --repository-url  https://test.pypi.org/legacy/ dist/*

Uploading the package

The final step of the journey is to upload it to PyPI. Once more it is mandatory to have an account and it is not the same as the TestPyPI one. Two registers have to be made in the two sites. The final command to run is:

$ twine upload dist/*

Following all the steps, just install the package using pip and use it!

pip install skoobpy

Conclusion

To summarise in this post I showed:

The idea of skoobpy and how to use it
How I prepared a virtual environment
How I built the package
Perform some tests
Some ways to check if the package is going to show as expected
How to upload the package

Succeeding some (much!) research to understand and solve many unexpected and unknown errors, I accomplished the goal. Hope it can be helpful to someone out there.

Thank you for reading!

Diego

CodeNewbie Community 🌱

How I created and published my first Python package on PyPI

📘 🐍 Skoobpy

How it works

Building the package

Creating a Virtual Environment

Looking at the source code

📂 skoobpy/

📂 skoobpy/skoobpy/

init.py

main.py

skoobpy.py

📂 skoobpy/tests/

test_skoobpy.py

Building the library

Checking for errors

Uploading the package

Conclusion

Top comments (0)

Read next

Built a Medical Website for a Dermatologist – Feedback Welcome!

test post image

Recursive Renaissance

How to Enable Auto Config Sync in SafeLine WAF 7.x

📘 🐍 Skoobpy

How it works

Building the package

Creating a Virtual Environment

Looking at the source code

📂 skoobpy/

📂 skoobpy/skoobpy/

__init__.py

__main__.py

skoobpy​.py

📂 skoobpy/tests/

test_skoobpy.py

Building the library

Checking for errors

Uploading the package

Conclusion

Read next

Built a Medical Website for a Dermatologist – Feedback Welcome!

test post image

Recursive Renaissance

How to Enable Auto Config Sync in SafeLine WAF 7.x

init.py

main.py

skoobpy.py