Hello there!
In this post (my first one!) I am going to explain how I created a simple Python package and published it on PyPI. I started it as part of my final project for CS50x course and decided to share the steps to reach the end.
The repository in GitHub with all the files discussed is here and the published package is here.
📘 🐍 Skoobpy
Some context is important to begin. Skoob is a social network focused on books very popular in Brazil and it is similar to goodreads. There it is possible to save books on different bookshelves, such as read, currently reading, desired ones, and so on.
As a user, I always wanted a way to be able to get these books data to use it for some purpose. For example, if I am paying attention to some book sales I have to browse for many pages in the site to see all the books that I saved on my desired bookshelf as the site does not have an API.
In this context, the package returns all the data, as the title, author, publisher, page numbers, from the desired books in a CSV file for a specific user.
How it works
skoobpy can be run in a command-line followed by an user_id. The data will be stored in a CSV file named books_user_id.csv.
$ python skoobpy <user_id>
Or it could be imported into a python file to use the data in other ways.
import skoobpy
from skoobpy import *
Building the package
Creating a Virtual Environment
In order to prevent future issues because of running into dependency issues due to changes that I may use in the project I created a virtual environment. For instance, if I use some version of the package request and in a future update they modified something, some part of my code that works just fine could just stop working. Also, if I am collaborating with someone else in the project, it is a great idea to be sure that everyone is working in the same environment.
First, I run the command to install virtualenv:
$ pip install virtualenv
Inside a folder called skoobpy I run the command below. This creates a folder called venv.
$ virtualenv venv
Now it is necessary to activate the environment. There is a difference depends on what operating system you are using here.
- For Windows, while using the WSL (Windows Subsystem for Linux) you should run the first command below (and if you are a beginner as I am read this). If you are not using WSL, run the second one:
$ source ./venv/Scripts/activate
(venv) $
$ \pathto\venv\Scripts\activate
(venv) $
- For linux you should run:
$ source ./venv/bin/activate
(venv) $
After this, the prompt will be prefixed with the name of the environment (venv) as showed below. This indicates that venv is currently active and python executable will only use this environment’s packages. To deactivate an environment simply run deactivate.
(venv) $ deactivate
$
Finally, I installed here all the dependencies that are going to be necessary to build the package. They are wheel, setuptools, twine, requests and to perform some tests pytest. I put all the names that I need in a file called requirements.txt to install everything at once and then my environment is ready for work.
$ pip install -r requirements.txt
Looking at the source code
Now that I have presented the idea, I am going to show how I did it. To begin, let's take a look at the directory structure of skoobpy:
skoobpy/
│
├─ skoobpy/
│ ├── __init__.py
│ ├── __main__.py
│ └── skoobpy.py
│
├── tests/
│ └── test_skoobpy.py
│
├── venv/
│
├── LICENSE
├── README.md
├── requirements.txt
└── setup.py
In this section, I will show the details of the code file by file. All the files can be seen in the GitHub repository.
📂 skoobpy/
Besides the setup.py, there is still the files LICENSE that I take the MIT default one for open-source projects and the README.md that documents the package.
The setup.py contains all the information that is important to PyPI. Here we define every aspect of the package, let's see some of them:
-
namedefines the actual name that will appear at the time to install the package. - In
packageyou can define what is going to be include or exclude from your package. I included onlyskoobpyto avoid the foldertests. -
versionshows the actual version of your package. A good source to understand the semantic of the version is looking at this. -
descriptionpresents a short description of what the package does. - In
long_descriptionit is possible to give a better description of the functionalities of the package. Here I simply used the content inREADME.md. - The
long_description_content_typemakes it possible to use a markdown file as the long description. -
authorandauthor_emailare important if you want to let people contact you about the package. -
urlpresents where to find more information about it. Usually the repository. -
install_requiresshows which other packages are mandatory to use this one. It is not necessary to list packages that are part of the standard Python library. -
classifiersare important to make it easy to find the package on the PyPI site.
from setuptools import find_packages, setup
with open('README.md', 'r', encoding='utf-8') as file:
long_description = file.read()
setup(
name ='skoobpy',
packages =find_packages(include=['skoobpy']),
version =__version__,
description ='extracts user\'s desired books from Skoob.com.br',
long_description = long_description,
long_description_content_type='text/markdown',
author ='Diego Lourenço',
author_email ='diego.lourenco15@gmail.com',
license ='MIT',
url ='https://github.com/Diegoslourenco/skoobpy',
platforms =['Any'],
py_modules =['skoobpy'],
install_requires =[],
classifiers =[
'Development Status :: 3 - Alpha',
'License :: OSI Approved :: MIT License',
'Operating System :: OS Independent',
'Programming Language :: Python',
],
)
📂 skoobpy/skoobpy/
__init__.py
This file represents the root of the package. Could be left empty, but I put the variable __version__ inside it to track the version in the future.
# __init__.py
__version__ = '0.1.3'
__main__.py
Briefly, this is the entry part of the program and has the responsibility to call others as needed. There are two imports here.
First, we have to import the argv from sys as it is taking the second argument (argv[1]) from the command line as the user_id.
In the other import, we take all the content in the file skoobpy that we are going to see in detail soon.
# __main__.py
from skoobpy import *
def main():
from sys import argv
user_id = argv[1]
books_json = get_all_books(user_id)
books_desired = filter_desired_books(books_json)
export_csv(books_desired, user_id)
if __name__ == "__main__":
main()
skoobpy.py
This is the file that does all the work. It imports requests to make the request to the site, the json to get the data from the site in a format that it is possible to work and csv to export what we want.
There are three functions defined here: get_all_books, filter_desired_books and export_csv.
The
get_all_bookscompose anurlusing theurl_baseskoob.com.br and theuser_idnumber.
Depends on the number of books saved by the user, it results in many pages on the site. For this reason, it is necessary to get thetotal_booksthat represents the total number of books. Thetotal_books_urlrepresents the final URL to request.
Finally, a request to thetotal_books_urlis made and the result is parsed as an object JSON is saved in the variablebooks_jsonand that is what the function return. Now we have all the book data from a user from skoob.filter_desired_booksreceives the data in a JSON and to take only the desired books, it has to check if the book fielddesejado(desiredin portuguese) is equal to 1. In a positive case, it saves the data from the book in a list. If the value is equal to zero, it means that this book is not desired. It returns the listbookspopulated with the desired ones.export_csvdefines in theheaderthe first row for the CSV file. After this, using theheaderand thebooks_listit opens a CSV file namedbooks_{user_id}saving each element of the list corresponding to a row.
# skoobpy.py
import requests
import json
import csv
url_base = 'https://www.skoob.com.br'
def get_all_books(user_id):
url = f'{url_base}/v1/bookcase/books/{user_id}'
print(f'request to {url}')
user = requests.get(url)
total = user.json().get('paging').get('total')
total_books = f'{url}/shelf_id:0/page:1/limit:{total}'
books_json = requests.get(total_books).json().get('response')
return books_json
def filter_desired_books(books_json):
books = []
for book in books_json:
if book['desejado'] == 1:
ed = book['edicao']
# if there is a subtitle, it must be concatenate to title
if ed['subtitulo'] != '':
book_title = str(ed['titulo']) + ' - '+ str(ed['subtitulo'])
else:
book_title = ed['titulo']
book_url = url_base + ed['url']
book_data = [book_title, ed['autor'], ed['ano'], ed['paginas'], ed['editora'], book_url]
books.append(book_data)
return books
def export_csv(books_list, user_id):
header = ['Title', 'Author', 'Published Year', 'Pages', 'Publisher', 'Skoob\'s Page']
with open(f'books_{user_id}.csv', 'w', encoding='utf-8', newline='') as csvfile:
data = csv.writer(csvfile, delimiter=';', quotechar='|', quoting=csv.QUOTE_MINIMAL)
data.writerow(header)
for book in books_list:
data.writerow(book)
return
📂 skoobpy/tests/
test_skoobpy.py
There are a couple of tests here to verify if the functions are taking the correct data before export to the CSV file from a specific user (my own user in this case).
# test_skoobpy.py
# Tests for the skoobpy module
# standard import
import csv
# third party import
import pytest
# skoobpy import
from skoobpy import *
@pytest.fixture
def total_books():
user_id = 1380619
return get_all_books(user_id)
@pytest.fixture
def total_desired_books():
user_id = 1380619
all_books = get_all_books(user_id)
return filter_desired_books(all_books)
# Tests
def test_total_books(total_books):
assert len(total_books) == 619
def test_total_desired_books(total_desired_books):
assert len(total_desired_books) == 466
Building the library
After all, the content is ready and everything is working well, it is time to build the package running:
python setup.py sdist bdist_wheel
This will create a new folder dist with two files.
- The
sdistcreates the source distribution (skoobpy-0.1.3.tar.gz). - The
bdist_wheelcreates the wheel file to install the package (skoobpy-0.1.3-py3-none-any.whl)
skoobpy/
│
└── dist/
├── skoobpy-0.1.3-py3-none-any.whl
└── skoobpy-0.1.3.tar.gz
Checking for errors
The first step is to look inside the skoobpy-0.1.3.tar.gz and see if everything is here, running the command below. The new files are created based on the information provided in the setup.py.
$ tar tzf ./dist/skoobpy-0.1.3.tar.gz
skoobpy-0.1.3/
skoobpy-0.1.3/PKG-INFO
skoobpy-0.1.3/README.md
skoobpy-0.1.3/setup.cfg
skoobpy-0.1.3/setup.py
skoobpy-0.1.3/skoobpy/
skoobpy-0.1.3/skoobpy/__init__.py
skoobpy-0.1.3/skoobpy/__main__.py
skoobpy-0.1.3/skoobpy/skoobpy.py
skoobpy-0.1.3/skoobpy.egg-info/
skoobpy-0.1.3/skoobpy.egg-info/PKG-INFO
skoobpy-0.1.3/skoobpy.egg-info/SOURCES.txt
skoobpy-0.1.3/skoobpy.egg-info/dependency_links.txt
skoobpy-0.1.3/skoobpy.egg-info/top_level.txt
Using twine to check if the distribution will render correctly on PyPI is another way to verify if everything is going as planned.
$ twine check dist/*
Checking dist/skoobpy-0.1.3-py3-none-any.whl: PASSED
Checking dist/skoobpy-0.1.3.tar.gz: PASSED
The final check could be performed by uploading the package to TestPyPI. This will confirm if the package is going to show the information on the site and execute as it should be. It is mandatory to have an account as the twine will ask for a username and password. After the upload, it is possible to go to TestPyPI, see the package there, and install it to test.
$ twine upload --repository-url https://test.pypi.org/legacy/ dist/*
Uploading the package
The final step of the journey is to upload it to PyPI. Once more it is mandatory to have an account and it is not the same as the TestPyPI one. Two registers have to be made in the two sites. The final command to run is:
$ twine upload dist/*
Following all the steps, just install the package using pip and use it!
pip install skoobpy
Conclusion
To summarise in this post I showed:
- The idea of
skoobpyand how to use it - How I prepared a virtual environment
- How I built the package
- Perform some tests
- Some ways to check if the package is going to show as expected
- How to upload the package
Succeeding some (much!) research to understand and solve many unexpected and unknown errors, I accomplished the goal. Hope it can be helpful to someone out there.
Thank you for reading!
Diego
Top comments (0)