Welcome to another edition of Somewhat Analytics - The Newsletter!
Put simply, this monthly newsletter serves as a compilation of resources that I find intriguing in the realm of data analytics and analytics engineering. These resources span various formats, including blogs, tutorials, podcasts, social posts, my personal thoughts, and more—why set any limits?
Disclaimer: The majority of resources covered are non-vendor-specific, although I always include a few gems from my exceptional team at paradime.io.
Now, let's dive in!👇
Blog - Finding Your Path by Emily Hawkins (Analytics Engineering, GlossGenius)
Whenever the opportunity to manage others arises, I immediately reject it. I acknowledge the immense value that effective management brings to an organization, but currently, it’s not the right career path for me. I’d rather excel as an individual contributor.
Colleagues might perceive this choice as contentment or disinterest in career progression, but the reality couldn't be further from the truth!
In this blog, Emily reflects on her journey into data engineering management, realizing it wasn't her ideal path. Despite achieving significant success as a manager, the role lacked the fulfillment she sought. The challenges of solving "people problems" didn't bring her the same joy as the daily rush of solving technical issues.
After three years in management, Emily made the decision to return to her roots - Analytics engineering.
If you find yourself grappling with the expectation to transition into management, take a moment to explore this blog! Emily shares her experiences and insights, offering valuable perspectives on career paths and personal fulfillment.
Social Post - SQL 101: ListAgg by Madison Schott (Senior Modern Data Stack Engineer, ConvertKit)
Madison is the author Learn Analytics Engineering Substack, and she often posts valuable SQL & dbt™ tips and tricks on LinkedIn. In this post, she neatly explains how to use the ListAgg function and why it’s important.
If you often find yourself needing to consolidate values from multiple rows into a single string, check it out!
Social Post - I don’t understand the Hype around dbt… What am I missing here? by Matt Martin (Staff Tech Engineer, State Farm)
It’s insightful to see folks outside of the modern data stack echo chamber discuss the value, and lack thereof, of dbt™. They ask the hard-hitting questions that you likely won’t find in various MDS data communities (probably because they get deleted! 🤣):
They pose questions like:
“Isn’t it just a janky way for analysts to build custom ETL pipelines?”
”Can’t I just use DDL or SQL Server Stored Procedures, and export it to Git?”
”I’ve been using source control, pipeline scheduling, and SQL-powered ETL for years… Why do I need dbt?”
Truth is, dbt™ isn’t a solution for every data team on planet earth (though I do believe it is for most of them!!)
You can find many “pro dbt” comments in that thread, but here’s one that I agree with whole-heartedly. 👇
If you’re curious to read other discussion like this, check out this one from r/dataengineering.
Social Post - You’re never really modeling data by Saadat Qadri (Fractional VP Data)
In this post, Saadat emphasizes a fundamental principle: "You're never really modeling 'data'... You're modeling a business."
Transitioning from the mere act of "modeling data" to the more impactful task of "modeling a business" requires discipline and an ongoing process of self-reflection.
If you're like me, you need constant reminders to refrain from just "modeling data." Here's workshop with Ergest Xheblati that reinforces this principle.
Alright, now let’s check out some of the work we’re doing at Paradime to make dbt™ development better for data teams!
Blog - Stuff we shipped #10 by Kaustav Mitra (CEO, Paradime)
Paradime consistently releases features and updates that enhance dbt™ development for data teams. Last week, we introduced the following:
Auto Complete - Avoid constant typing and copy-pasting during dbt™ development. Auto-complete will predict your SQL & Jinja as you type!
Peek Definition - Eliminate context switching and disruptions during dbt™ development. Highlight any source, model, or macro to generate a preview of the code within the current file.
Query Limit - Reduce cloud data warehouse spend when previewing data. With Query limit, you can limit the number of returned rows anywhere between 1-1000!
Download CSV - Need a quick .csv of data from one of your models? No problem, you can download it directly from Paradime’s preview Panel.
Somewhat Analytics Podcast by Parker Rogers and Kaustav Mitra
When Kaustav and I created the Somewhat Analytics podcast, we had one non-negotiable rule: Only interview guests who genuinely interest us.
Our recent guests are data experts doing fascinating work in the NBA and personal fitness:
MDS-in-a-box w/ Jacob Matson (Product @ Simetric)
Jacob is well know for his expertise in dbt™ , excel, and fun side projects like MDS-in-a-box, and nba-monte carlo.
In this episode, Jacob shares:
MDS-in-a-box - What’s this fast, free, open-source modern data stack you’ve been building?
DuckDB, what’s all the fuss about?
How are you analyzing/prediction outcomes of the 2023-34 NBA season?
What project(s) would you like to build in the future?
Here’s my favorite snippet from the conversation:
Being a data-driven endurance athlete w/ Marco Altini (Founder @ HRV4Training)
We stumbled upon Marco via a simple google search: “in-depth data analysis of personal physical activity” 😂
Marco has a PH.D in applied machine learning, and separate degrees in Human Science and computer science. He’s also an amateur endurance athlete, entrepreneur, and founder.
In this conversation, you’ll learn about:
What Marco has learned after years of analyzing his physical and mental health
Marco’s data science work at Ōura and HRV4Training
How the HRV4Training app helps athletes better understand their athletic performance, prevent over training, and reduce stressors.
Blog - Paradime Feature Spotlight: Merge Conflict Resolution by Parker Rogers (Data Advocate, Paradime)
Merge conflicts are an inevitable aspect of dbt™ development, especially when analytics teams are constantly updating dbt™ models to add value to their organizations.
With Paradime’s new merge conflict features, you can resolve them efficiently and reliably.
Check out this quick video tutorial to better understand the features!
dbt Project - NBA Data modeling w/ Paradime
If you’ve got to learn the ins and outs of dbt™ modeling with Paradime, why not use data from one of your hobbies?
In this GitHub repo, I’ve been modeling NBA data from the 2022/23 season. Here’s what I’ve built so far:
🚚 𝐈𝐧𝐠𝐞𝐬𝐭𝐢𝐨𝐧: public NBA API + Python
🗄️ 𝐒𝐭𝐨𝐫𝐚𝐠𝐞: DuckDB (development) & Snowflake (Production)
🔄 𝐓𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧𝐬: paradime.io (dbt™)
📊𝐒𝐞𝐫𝐯𝐢𝐧𝐠 (𝐁𝐈) - Lightdash
My favorite learning so far? The Memphis Grizzlies had the lowest cost per win in the 2022/23 season! 👇
That concludes this edition of the Somewhat Analytics Newsletter! Catch you next month!
I love that you included that LinkedIn post about "what's the hype about dbt?". I was inspired by this post too and tomorrow's entire newsletter will be dedicated to it!