🤖 GitHub Copilot for Data Engineers?

Github copilot; Polars the Spark killer?

Aug 12, 2023

🤖Have you tried GitHub Copilot?

GitHub Copilot is a tool that allows developers to generate AI-based code suggestions in the comfort of their IDE. The tool not only suggests a line of code as you type but depending on the context can generate entire methods, boilerplate code, and even unit tests.

But, is it any good for Data Engineers? Let’s find out.

#1 Install Github copilot using the VSCode extension.

Context is key, the more context you give the better are the suggestions. For instance, if you are writing PySpark code, the right imports help give better context.

#2 As you type copilot makes suggestions, use TAB to accept the suggestion.

#3 You might like to explore different options for a solution. Hit Ctrl+Enter as you type to generate more solutions to pull up a window that generates multiple solutions, simply Accept the one that you like.

#4 Another way to generate code is by outlining your idea using a comment and allowing GitHub Copilot to generate the implementation.

Sometimes, the suggestions are off by a mile but most of the time they are enough to work with and saves you time from crawling through documentation and web searches.

After exploring trivial to complex Data Engineering functions, GitHub copilot is able to suggest salvageable code and improve day to day efficiency of a Data Engineer. It is definitely worth a try.

Copilot is not free, there is a 30-day trial period and thereafter a montly cost of 10$.

🐳Data Engineering Tip #1

Want to exclude columns in a Spark SQL Select clause with 100s of columns, use the keyword EXCEPT.

💡Data Engineering Tip #2

Compare two spark data frames for schema and data equality using the library chispa. A great addition for writing unit tests.

pip install chispa

🐻‍❄️Tech News: Polars Company

Polars a data frame library written in Rust built upon the Apache Arrow implementation. Its C++ like performance has made it one of the most popular open-source libraries since its inception.

Now, the creators of Polars have just announced the formation of a company to build a Rust-based compute platform to enable data processing at scale.

Is Polars the Spark killer?

Urban Data Engineer's Newsletter

Discussion about this post