pickle Module: Serializing and Deserializing Python Object

pickle Module: Serializing and Deserializing Python Object

·

10 min read

Sometimes you need to send complex data over the network, save the state of the data into a file to keep in the local disk or database, or cache the data of expensive operation, in that case, you need to serialize the data.

Python has a standard library called pickle that helps you perform the serialization and de-serialization process on the Python objects.

In this article, you'll learn about data serialization and deserialization, the pickle module's key features and how to serialize and deserialize objects, the types of objects that can and cannot be pickled, and how to modify the pickling behavior in a class.

Object Serialization

Well, serialization refers to the process of converting the data into a format that can be easily stored, transmitted, or reconstructed for later use.

Pickling is the name given to the serialization process in Python, where Python objects are converted into a byte stream. Unpickling, also known as deserializing, is the inverse operation in which byte data is converted back to its original state, reconstructing the Python object hierarchy.

The pickle Module

Pickling and unpickling are Python-specific operations that require the use of the pickle module.

The pickle module includes four functions for performing the pickling and unpickling processes on objects:

pickle.dump(obj, file)pickle.load(file)
pickle.dumps(obj)pickle.loads(data)
  • The pickle.dumps() function returns the serialized byte representation of the object.obj: The object to be serialized.file: The file or file-like object in which the serialized byte representation of the object will be written.The pickle.dump() function is used to write the serialized byte representation of the object into a specified file or file-like object.obj: The object to be serialized.The pickle.load() function reads the serialized object from the specified file or file-like object and returns the reconstructed object.file: The file or file-like object from which the serialized data is read.The pickle.loads() function returns the reconstructed object from the serialized bytes object.obj: serialized bytes object to reconstruct.

The pickle.dumps() function returns the serialized byte representation of the object.

  • obj: The object to be serialized.

  • file: The file or file-like object in which the serialized byte representation of the object will be written.

The pickle.dump() function is used to write the serialized byte representation of the object into a specified file or file-like object.

  • obj: The object to be serialized.

The pickle.load() function reads the serialized object from the specified file or file-like object and returns the reconstructed object.

  • file: The file or file-like object from which the serialized data is read.

The pickle.loads() function returns the reconstructed object from the serialized bytes object.

  • obj: serialized bytes object to reconstruct.

How to Pickle and Unpickle Data

Consider the following scenario: pickling the data and saving it to a file, then unpickling the serialized object from that file to reassemble it in its original form.

import pickle

# Sample data
my_data = {
    "lib": "pickle",
    "build": 4.33,
    "version": 2.1,
    "status": "Active"
}

# Serializing
with open("lib_info.pickle", "wb") as file:
    pickle.dump(my_data, file)

# De-serializing
with open("lib_info.pickle", "rb") as file:
    unpickled_data = pickle.load(file)

print(f"Unpickled Data: {unpickled_data}")

The above code serializes the my_data dictionary and the serialized data is written to a file called lib_info.pickle in binary mode (wb).

The serialized data is then deserialized from the lib_info.pickle using the pickle.load() function.

Unpickled Data: {'lib': 'pickle', 'build': 4.33, 'version': 2.1, 'status': 'Active'}

Take a look at another example in which you have a class that contains multiple operations.

import pickle

class SampleOperation:
    square = 5 ** 2
    addition = 5 + 7
    subtraction = 5 - 7
    division = 14 / 2

# Object created
my_obj = SampleOperation()

# Serializing
pickled_data = pickle.dumps(my_obj)
print(f"Pickled Data: {pickled_data}")

# De-serializing
unpickled_data = pickle.loads(pickled_data)
print(f"Unpickled Data (Division): {unpickled_data.division}")
print(f"Unpickled Data (Square): {unpickled_data.square}")
print(f"Unpickled Data (Addition): {unpickled_data.addition}")
print(f"Unpickled Data (Subtraction): {unpickled_data.subtraction}")

In the above code, an object of the SampleOperation class is created and stored in the my_obj variable.

The object my_obj is serialized using the pickle.dumps() function and the serialized data is stored in the pickled_data variable.

Then, the serialized data (pickled_data) is deserialized using the pickle.loads() function and the attributes of the unpickled object are printed.

Pickled Data: b'\x80\x04\x95#\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x0fSampleOperation\x94\x93\x94)\x81\x94.'
Unpickled Data (Division): 7.0
Unpickled Data (Square): 25
Unpickled Data (Addition): 12
Unpickled Data (Subtraction): -2

This demonstrates that the deserialization process successfully reconstructed the object.

What Can be Pickled and Unpickled?

The pickle module can pickle a variety of objects, including strings, integers, floats, tuples, named functions, classes, and others.

However, not all types of objects are picklable. Certain types of objects, for example, file handles, sockets, database connections, and custom classes that lack necessary methods (such as __getstate__ and __setstate__), may not be picklable.

Here's an example of attempting to pickle a database connection.

import pickle
import sqlite3

conn = sqlite3.connect(":memory:")
# Pickling db connection object
pickle.dumps(conn)

When you run this code, you will receive a TypeError stating that the connection object cannot be pickled.

TypeError: cannot pickle 'sqlite3.Connection' object

Similarly, functions that are not defined with the def keyword, such as the lambda function, cannot be pickled using the pickle module.

import pickle

lambda_obj = lambda x: x ** 2
pickle.dumps(lambda_obj)

The above code is attempting to pickle the lambda function object, but it will return an error.

Traceback (most recent call last):
  ...
    pickle.dumps(lambda_obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x000001EB55373E20>: attribute lookup <lambda> on __main__ failed

Modify the Pickling Behaviour of the Class

Let's say you have a class that contains different attributes and some of them are unpicklable. In that case, you can override the __getstate__ method of the class to choose what you want to pickle during the pickling process.

import pickle

class SampleTask:
    def __init__(self):
        self.first = 2**17
        self.second = "This is a string".upper()
        self.third = lambda x: x**x

obj = SampleTask()
pickle_instance = pickle.dumps(obj)
unpickle = pickle.loads(pickle_instance)
print(unpickle.__dict__)

If you directly run the above code, the result will be an error due to the lambda function defined within the class which is unpicklable.

Traceback (most recent call last):
  ....
    pickle_instance = pickle.dumps(obj)
AttributeError: Can't pickle local object 'SampleTask.__init__.<locals>.<lambda>'

To tackle this kind of situation, you can influence the pickling process of the class instance using the __getstate__ method. You can include what to pickle by overriding the __getstate__ method.

import pickle

class SampleTask:
    def __init__(self):
        self.first = 2**17
        self.second = "This is a string".upper()
        self.third = lambda x: x**x

    def __getstate__(self):
        state = self.__dict__.copy()
        del state['third']
        return state

obj = SampleTask()
pickle_instance = pickle.dumps(obj)
unpickle = pickle.loads(pickle_instance)
print(unpickle.__dict__)

In the above example, the __getstate__ method is defined, and within this method, a copy of the attributes is made. To exclude the lambda function from the pickling process, the attribute named third is removed and then the attributes are returned.

When you run the above example, you will get the dictionary containing the results of the attributes.

{'first': 131072, 'second': 'THIS IS A STRING'}

Now if you want the excluded lambda expression to appear in the unpickled dictionary above, you can use the __setstate__ method to restore the state of the class's object.

import pickle

class SampleTask:
    def __init__(self):
        self.first = 2**17
        self.second = "This is a string".upper()
        self.third = lambda x: x**x

    def __getstate__(self):
        state = self.__dict__.copy()
        del state['third']
        return state

    def __setstate__(self, state):
        self.__dict__.update(state)
        self.third = lambda x: x**x

obj = SampleTask()
pickle_instance = pickle.dumps(obj)
unpickle = pickle.loads(pickle_instance)
print(unpickle.__dict__)

In the above code, the __setstate__ method restores the state of the object. During unpickling, the __setstate__ method is called to restore the state of the object.

When you run the above code, you will see the dictionary having the lambda function object.

{'first': 131072, 'second': 'THIS IS A STRING', 'third': <function SampleTask.__setstate__.<locals>.<lambda> at 0x000001C54EEB67A0>}

Customizing Pickling: Modifying Class Behavior for Database Connections

As you know, a variety of objects are unpicklable. Here's an example that shows how you can pickle the database connection object by modifying the pickling behavior of the class.

# pickling_db_obj.py
import pickle
import sqlite3

class DBConnection:
    def __init__(self, db_name):
        self.db_name = db_name
        self.connection = sqlite3.connect(db_name)
        self.cur = self.connection.cursor()

    # Method for creating db table
    def create_table(self):
        self.connection.execute("CREATE TABLE IF NOT EXISTS users (name TEXT)")
        return self.connection

    # Method for inserting data into db table
    def create_entry(self):
        self.connection.execute("INSERT INTO users (name) VALUES ('Sachin')")
        res = self.connection.execute("SELECT * FROM users")
        result = res.fetchall()
        print(result)
        return self.connection

    # Method for closing db connection
    def close_db_connection(self):
        self.cur.close()
        self.connection.close()

The above code defined a class DBConnection, and the SQLite database connection is initialized within this class.

In addition, three new methods are added: create_table (for creating a database table), create_entry (for inserting and retrieving data from the table), and close_db_connection (for closing the database connection).

Now exclude the database connection from the pickling process using the __getstate__ method.

# pickling_db_obj.py
...

    def __getstate__(self):
        state = self.__dict__.copy()
        # Exclude the connection and cursor from pickling
        del state['connection']
        del state['cur']
        return state

db_conn = DBConnection(":memory:")
pickle_db_conn = pickle.dumps(db_conn)
unpickle_db_conn = pickle.loads(pickle_db_conn)
print(unpickle_db_conn.__dict__)

The __getstate__ method creates a copy of the object's dictionary, then removes the connection (state['connection']) and cursor (state['cur']) and returns the dictionary (state).

The DBConnection class instance is created and passed the database name (":memory:") that will be created in memory.

The database connection object is then pickled, which is then unpickled and printed.

{'db_name': ':memory:'}

As you can see, the dictionary of the object only contains the database name. The connection and cursor objects have been removed.

The __setstate__ method is now required to restore the object's original state during unpickling, in which the database connection will be reestablished.

# pickling_db_obj.py
...

    ...

    # Restoring the original state of the object
    def __setstate__(self, state):
        self.__dict__.update(state)
        self.connection = sqlite3.connect(self.db_name)
        self.cur = self.connection.cursor()


db_conn = DBConnection(":memory:")
pickle_db_conn = pickle.dumps(db_conn)
unpickle_db_conn = pickle.loads(pickle_db_conn)

unpickle_db_conn.create_table()
unpickle_db_conn.create_entry()
unpickle_db_conn.close_db_connection()

print(unpickle_db_conn.__dict__)

Within the __setstate__ method, the state dictionary is updated and the new database connection and the cursor are created.

To check if the pickling process works, the create_table, create_entry, and close_db_connection methods are called on the unpickled class instance (unpickle_db_conn).

When you run the whole script, you will obtain the following output.

[('Sachin',)]
{'db_name': ':memory:', 'connection': <sqlite3.Connection object at 0x00000240D6F12A40>, 'cur': <sqlite3.Cursor object at 0x00000240D78044C0>}

As you can see, everything went well, and the object's dictionary now has both a connection and a cursor object along with the database name, demonstrating the successful unpickling of the database connection.

Keep in mind that if the __getstate__ method returns the false value, the __setstate__ method will not be called upon unpickling. Source

While the ability to customize the __setstate__ method during unpickling provides flexibility, it also comes with security considerations. Arbitrary code can be executed during unpickling, which can be a security risk if the pickled data comes from untrusted or malicious sources.

So, what can you do to reduce the security risk? You can't do much, but you can make sure that data from untrustworthy sources isn't unpickled. Validate the authenticity of the pickled data during unpickling by using cryptographic signatures to ensure that it has not been tampered with, and if possible, sanitize the pickled data by checking for malicious content.

Conclusion

The pickle module lets you serialize and deserialize the data and now you know how to do it using the pickle module. You can now convert the object data into bytes that can be transmitted over a network or saved into disk for the future.

In this article, you've learned:

  • What are object serialization and deserialization

  • How to pickle and unpickle data using the pickle module

  • What type of object can be pickled

  • How to modify the pickling behavior of the class

  • How to modify the class behavior for database connection


🏆Other articles you might be interested in if you liked this one

Hash passwords using the bcrypt library in Python.

How to use pytest to test your Python code?

Create a WebSocket server and client in Python.

Create multi-threaded Python programs using a threading module.

Create and integrate MySQL database with Flask app using Python.

Upload and display images on the frontend using Flask.


That's all for now

Keep Coding✌✌

Did you find this article valuable?

Support Team - GeekPython by becoming a sponsor. Any amount is appreciated!