logo

How to read and write Sequential data gracefully in Python

Published on
 •  6 mins read
How to operate Sequential data gracefully in Python

In this blog, I will teach you different ways to use lists in Python. You might be thinking I know what a list is, I have used it numerous times. But there is a difference between using a list and using a list efficiently. So, I will show you what I have learned as a Software Engineer.

We can create a list basically two ways:

  1. using list() constructor
  2. using [] list comprehension (or using square brackets you might say)

There is no difference in output when we use these methods. Let's see!

Add/create a list using list comprehension []

def list_comprehension() -> list:
    return [num for num in range(10)]

data = list_comprehension()
print(data)

"""OUTPUT:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
"""

Add/create a list using for loop

def list_for() -> list:
    data_list = []
    for num in range(10):
        data_list.append(num)
    return data_list

data = list_for()
print(data)

"""OUTPUT:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
"""

As I said, there is no difference. We will see the difference when we try to add a huge amount of data. Let's see!

Time took to add data for list comprehension

import time

def time_process(func):
    """Measure time"""
    def inner():
        start_time = time.time()
        func()
        end_time = time.time()
        elapsed_time = end_time - start_time
        print('Execution time:', time.strftime("%H:%M:%S", time.gmtime(elapsed_time)))
    return inner

@time_process
def list_comprehension() -> list:
    """Return list comprehension"""
    return [num*2 for num in range(1000000000)]

list_comprehension()

"""OUTPUT:
Execution time: 0:00:39
"""

Time took to add data for the for loop

@time_process
def list_for() -> list:
    data = []
    for num in range(1000000000):
        data.append(num*2)
    return data

list_for()

"""OUTPUT:
Execution time: 0:00:52
"""

You might say this is not that big of a difference. But it is! In real-time it makes a difference. And when you don't know how much data you will receive from DB or API, comprehension is the best option to create a list.

You might say, what if we have condition?

If you need to use only an if then comprehension got your back!

Comprehension with for loop

@time_process
def list_comprehension() -> list:
    return [num for num in range(1000000000) if num%2 == 0]

list_comprehension()

"""OUTPUT:
Execution time: 0:00:35
"""

Append with for loop

@time_process
def list_for() -> list:
    data = []
    for num in range(1000000000):
        if num%2 == 0:
            data.append(num)
    return data

list_for()

"""OUTPUT:
Execution time: 0:00:42
"""

This is about writing, what about reading, you might say! To read data efficiently, instead of returning a list we can use generators.

Why? You might ask!

@time_process
def read_from_generator():
    """Return list comprehension"""
    generator_data = (num for num in range(1000))
    return generator_data

"""OUTPUT:
AVG time took to read: 0.03886222839 ms
"""

@time_process
def read_from_list():
    """Return list  generated by for without comprehension"""
    list_data = list(num for num in range(1000))
    return list_data
"""OUTPUT:
AVG time took to read: 0.04118919373 ms
"""

To read 1000000000 data from generators, it took 23 sec, whereas it took 53 secs for List. But for the time being, let's ignore the time they took to read the data.

Let me give you an inside secret if you don't need to re-read the data always use generators. Why? You might ask!

The reason to use generators is, Generators tend to be very memory efficient. Generators return data only when it is needed or asked for. Let's see with an example.

# read 3 values from generator
from_generator = read_from_generator()

for i in from_generator:
    print(i, end=' ')
print('\nCompleted first read!')

for i in from_generator:
    print(i, end=' ')
print('\nCompleted last read!')

"""OUTPUT:
0 1 2
Completed first read!

Completed last read!
"""
# read 3 values from generator
from_list = read_from_list()

for i in from_list:
    print(i, end=' ')
print('\nCompleted first read!')

for i in from_list:
    print(i, end=' ')
print('\nCompleted last read!')

"""OUTPUT:
0 1 2
Completed first reaad!
0 1 2
Completed last reaad!
"""

In the above code block, if I get data from read_from_generator method and iterate through that and I get data from read_from_list and iterate through that, I will not see any difference. But now if I try to iterate through both of them again, I will not see any iterations in the case of generators.

If you want the data again, you need to call read_from_generator method again. You might say why the hell do I want that. The reason is MEMORY EFFICIENCY!

Generators don't keep all the data in the memory all the time, unlike lists. Once you read the data it is gone. So, if you know you know you will iterate through data only once, always choose Generators.

I ran every method we discuss in this blog 25 times and the below table is the result of that.

List ComprehensionWithout List ComprehensionList comprehension and ifWithout List comprehension and ifRead from GeneratorRead from List
0:00:390:00:520:00:350:00:420:00:230:00:56
0:00:390:00:520:00:350:00:420:00:230:00:57
0:00:380:00:520:00:350:00:410:00:230:01:01
0:00:380:00:520:00:340:00:420:00:230:01:01
0:00:400:00:520:00:350:00:420:00:230:00:53
0:00:390:00:520:00:350:00:420:00:230:00:53
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:390:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:51
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:350:00:410:00:230:00:51
0:00:380:00:520:00:350:00:420:00:230:00:53
0:00:380:00:520:00:350:00:420:00:230:00:52
0:00:390:00:520:00:350:00:420:00:230:00:52
0:00:390:00:520:00:350:00:420:00:230:00:52
0:00:380:00:520:00:340:00:410:00:230:00:52
0:00:380:00:520:00:350:00:410:00:230:00:53
0:00:390:00:520:00:350:00:410:00:230:00:52
0:00:380:00:520:00:350:00:420:00:230:00:53
0:00:380:00:520:00:350:00:410:00:230:00:53
0:00:390:00:520:00:350:00:410:00:230:00:52

Conclusion

Always use List comprehension and Generators whenever possible.

Subscribe to Newsletter