Member-only story
Must Know Data Structures for Data Engineering
Throughout my career in the analytics space working as a data scientist and data engineer I have come across a number of data structures and algorithms that has enabled me to do my role. Below are the data structures and algorithms I have used extensively programming in Python.
2 min readJan 8, 2024
Lists
A list is a collection of data that maintains order. You can think of this data structure as an array.
Below is time complexity of commonly used operations:
Operation Average Case
Append O(n)
Pop (end) O(1)
Pop (not end) O(n)
Insert O(n)
Get O(1)
Set O(1)
X in S O(n)
min/max O(n)
length O(1)
Tuples
Tuples are similar to lists but are immutable, meaning it cannot be changed once created.
Sets
Sets are mutable data structures that does not have any duplicate values. Furthermore, it is un-ordered. It is very useful when comparing data to see if one set is a subset of another as well as for de-duplicating data.