Member-only story

Must Know Data Structures for Data Engineering

Throughout my career in the analytics space working as a data scientist and data engineer I have come across a number of data structures and algorithms that has enabled me to do my role. Below are the data structures and algorithms I have used extensively programming in Python.

Marco Susilo
2 min readJan 8, 2024

--

Photo by Claudio Schwarz on Unsplash

Lists

A list is a collection of data that maintains order. You can think of this data structure as an array.

Below is time complexity of commonly used operations:

Operation      Average Case
Append O(n)
Pop (end) O(1)
Pop (not end) O(n)
Insert O(n)
Get O(1)
Set O(1)
X in S O(n)
min/max O(n)
length O(1)

Tuples

Tuples are similar to lists but are immutable, meaning it cannot be changed once created.

Sets

Sets are mutable data structures that does not have any duplicate values. Furthermore, it is un-ordered. It is very useful when comparing data to see if one set is a subset of another as well as for de-duplicating data.

--

--

Marco Susilo
Marco Susilo

Written by Marco Susilo

A Kaggle expert, certified Machine Learning Specialist AWS, passionate about cloud, analytics and technology. Founder of PassionIT (https://www.passionit.tech)

No responses yet