API Key

Why I'm leaving Alteryx behind and implementing Databricks in 2025

Try Databricks For Yourself - I’m glad I did.

Here’s my story, my reasoning, and what I can do to help you get there.

My Past: Alteryx as the Essential "Swiss Army Knife" for non-codeR Analysts

A decade ago, around 2015, I discovered Alteryx. For me, as someone who rarely coded, it was a game-changer. It was the "Swiss army knife" I needed to ingest, shape, and aggregate data for my Tableau dashboards. Back then, Tableau had rigid data structure requirements and a lot of work needed to be done before putting data into it in order to build effective reports for executives. At the time, Alteryx was the perfect tool to bridge that gap. I loved it; it was fantastic and compared to any other option, it was easily the best product on the market.  It was so good, that I made championing Tableau and Alteryx the focus of my career for the next five years.

The Present: A New Era and the Evolution of Databricks

Fast forward to today. We are in a completely different era, driven largely by advancements in large language models, but also by advancements in the ease of use in adopting the Databricks platform. They’ve done a whole lot to democratize it for all knowledge workers. 

Truly, in the past year or so, Databricks has evolved from a niche, complex tool for big data into a powerful, accessible platform for everyone. That’s a pretty bold statement, so maybe it’s helpful to share a bit about how we got here and what’s changed?


Databricks in 2022: Was a bit like a finicky "Muscle Car"

Back in the olden times of 2022 maybe even 2023 or early 2024, Databricks was primarily seen as "managed Spark." It was incredibly powerful for massive data volumes but also hugely intimidating for someone less technical, like me. It required significant tuning and configuration—much like an unreliable muscle car you wouldn't dare drive unless you were mechanically savvy. I wouldn't go anywhere near it at the time.

Databricks in 2025: The Accessible Platform

Today's Databricks is a completely different beast thanks primarily to two key changes and a lot of other smaller details that exist under that hood. To continue torturing my car metaphor, it’s like a Rivian R1T Quad Motor Launch Edition. It can haul massive amounts of data, but it also goes from standing still to 60 miles an hour in 2.5 seconds. It’s easy to drive, versatile, and deeply satisfying.

Ah, we’re still talking about Databricks, right? Yes! Yep.

The First Key Change: Databricks’ Serverless offering

Serverless, which means you don’t have to provision bundles of hardware called clusters, does the work to abstract away all the complexity of managing those clusters, sizing them, turning them off to save money, all that... I no longer have to worry about: Cluster sizing, whether a cluster is left on too long, or hemorrhaging cash due to misconfiguration.

I’ll be the first to advise that Databricks does charge a bit more for a given action when using Serverless instead of the much more complex to manage “classic” compute model, but the benefit for small shops is immense. For my mom-and-pop business, I am willing and able to use Databricks for analytics because the risk of a costly misconfiguration is drastically lower. To date, I’m about $30 dollars into my bill, except they do have a brief free window for new users like myself.

The Second Key Change: Databricks Assistant, an LLM for working with data

The advent of LLMs within Databricks for writing code is the second crucial change, and I cannot emphasize enough how powerful this is for less technical folks like myself. Compared to Alteryx, it makes dragging and dropping tools onto a canvas feel like using an abacus when everyone else has a calculator.

Now, I do have a modestly technical background:  I’ve written a little Python, some JavaScript, and a fair bit of basic SQL, so I generally know what to expect from code that I read. However, like 98% of the population, I'm terrible at actually writing code simply because I do it so rarely, maybe I write a couple hundred lines of code every few years or so.

So to be able to bust out an entirely new pipeline of code to grab a given location’s weather data from the National Weather Service in just a half hour or so, and incrementally ingest that data into a Delta Live Table, that’s pretty amazing.  

A Radically Lessened Learning Curve

Together, these two innovations have radically lessened the learning curve for adopting Databricks.  Sure, if you’re going to do this in an enterprise grade environment, you’ll probably want considerable help along the way - And that’s a big reason why companies like Indicium exist: We’re great at managing the change and reducing the friction that comes with doing large data, analytics, and AI migrations. In fact, we offer a free Alteryx-to-Databricks migration assessment for well-qualified candidates. And if you’re curious, or you’re already going down this same path, please reach out to us via the form above, or reach out to me directly, at joseph.schafer@indicium.tech today and we’ll have a chat.

But maybe you’re not there yet, maybe you’re a bit skeptical.  That’s ok, because Databricks has a free edition, so there’s nothing stopping you from checking it out for free on a personal account, just so you can confirm the bold claims I’m making here!  In fact, that’s what I did over the summer, because I was wondering how all these advancements might play out for someone like me.

Seriously, go check out the free version, here. I did this myself, all before I went out and started building out my project within Databricks for my side hustle.  I needed to see this new world, risk-free, for myself and you can too! If your organization is starting to get serious about a migration, or perhaps you’re not getting the value you expected, reach out to me. We’re here to help get you on your way to the modern data stack!

So what’s next?!

I’m so glad you Asked! Well, I’ll share a bit my own personal journey over the coming weeks, so you can get a bit more of a flavor of what the day-to-day experience is like, building out a mom-and-pop analytics hub in Databricks.

I have pretty big ambitions. I’m working on building out a weather table from the National Weather Service, because rainfall and temperature seems to have a huge impact on our foot traffic. It’ll also give me the chance to try out some of the more interesting AI features within Databricks around data science workloads.

After that, or perhaps before - my roadmap is fluid - I want to pull in inventory data. Toast’s inventory API isn’t public yet, so I’m waiting on getting access, or clarifying what access I might already have. Then I can start to focus on inventory turns, and predicting stockouts based on weather forecasts. Basically some of the coolers things that Crisp does, but for lil mom and pops, since they only play with the biggies.

Adventures in Marketing: The Highlights

A few years ago, in a foray outside my normal sphere of influence, I spent a few months working cross-functionally to build out an elaborate Executive Marketing Dashboard for the Marketing team at Seagate. 

First, some context: In the past, the Marketing department's leadership has driven their monthly meetings through PowerPoint presentations.  Much of the normal work for analysts across the department would grind to a halt while data was pulled, groomed, graphed, and compiled into slide decks.  Leadership was looking to move to Tableau to stabilize the subject matter, track to consistent goals, and drive the business through an automated set of instruments.

A recent copy of the Executive Marketing Dashboard

I was lucky enough to have some spare time at work around the holiday season of 2016, and while browsing the jobs list for a friend, I noticed an intriguing job description for a Marketing Analyst.  The post's technical needs were consistent with my capabilities, but the experience within marketing was lacking.  That, and I really enjoy my current job.   On a lark, I reached out to see if I could share how my team does what it does, in exchange for some hands on experience into the marketing world.  And that's how I found myself as the technical owner of Seagate's Executive Marketing Dashboard. 

Know your Audience, NOt just your Data!

Whenever you're building a dashboard, it's crucial to understand both your data and your audience.  There's a relationship there, between the two, and it will emerge in a properly designed dashboard. 

In my experience, executive management generally needs to stay at a high level view of their business.  In the case of the EMD, the need to stay high level was emphasized by the sheer number of topics getting presented within a relatively short forum. 

So rather than designing the dashboard as a drill-into-detail dashboard, this was serving to smoothly transition management from a static, PowerPoint presentation into a new world of interactive analytics.  The requirements I was given included some strict guidelines: No filters, no drilling.   Management wanted it to essentially be a one-pager, with 10 different visualizations based on 10 different data sets, all on the same page.  Right off the bat, this means every viz has to be crafted with an attention to dramatic spacial constraints: each one was going to get only about 300 x 400 px worth of room.  Fortunately, since filters and parameters take up space, these requirements weren't at odds with one another. 

Do not adjust your screen

This causes text to scale differently.

For better or worse, management tends to skew towards both towards farsighted folks and owners of fancier Windows computers with higher resolutions, which tends to mean they use a higher DPI scaling setting.

Enter: the Long Form View.  Each viz in the main dashboard is reproduced on a longer form dashboard, and thus are given ample room to breathe, solving for both Window's scaled up fonts and the 50+ crowd that forgot to bring their reading glasses to the meeting.

EMD's Long View, with buffer and goal parameters visible, giving mgmt the flexibility to tell their own story.

Choose your own ending: Buffers

One benefit of presenting the vizes in two different ways, I was able to sneak in a bit of clever instrumentation that I call buffers.  If you build a set of calculations that finds the minimum and maximum values on a chart, and then add a "buffer" constant to them, you can sneak in a hidden reference line which has the subtle effect of re-calibrating the scale of the axis it is built upon. 

buffer implementation explained.png

So, if normally your line chart looks as jagged as the Rockies, you can alter a parameter  that drives the buffer constant (I guess it's a variable now) to scale out the axis such that the peaks and valleys are no more thrilling than the rolling hills of Ohio.  Now, I know this isn't scientifically sound, tinkering with the axis like this, but remember, we're working for Marketing today, not Engineering!

Like I said, you gotta know your Audience!

Show a lot of data in a little space

The biggest visualization challenge I had was how to display all the data for the Amazon Marketing vizes.  I had two categories of advertising, AMS and AMG, which had their own vastly different scales, their own goals, spend, revenue, and the relationship between revenue and goal. So right off, they need to be separated. 

green and red in tooltip.png

Because there was so much to track, I needed to find ways of keeping the rev-to-goal obvious, without being overwhelming.  Since the most important factor is "did we make goal", that point is emphasized in redundant ways.  With the color scheme implemented in three ways, combined with the check/x symbols, it is crystal clear which quarters met goal. 

At that point I still hadn't shown spend relative to goal beyond a pass/fail, so I added goal lines based on a monetized goal.  The goals are multiples of spend, so I built a calculation based on parameters.  Then I drew goalposts using reference lines.  In this way, viewers can also easily see how well we did relative to goal.

 

 

 

Getting to know Marketing data

I spent the overwhelming majority of my time getting to know - and automate - the data sets involved in this dashboard.  The data sets are diverse enough in origin and type that most merit their own blog posts.  I'm quite proud of my work on this project because not only did I accomplish the primary goals of building a viz tailored for my audience, but the data sources are automated and in all cases, such automation had never been achieved within Seagate.  No one in Marketing, to my knowledge, had automated Google Analytics into Tableau, and no one had ever automated Sysomos and iPerception data into Tableau using their respective APIs.  This aspect - blazing a trail, proving out and sharing these huge quality of life improvements for my fellow analysts, that has been immensely satisfying to me.  The weeks and months since have been dedicated to training up my fellow analysts on how to take these lessons and expand them to all their day-to-day reporting.

A few highlights from that adventure:

For the entire dashboard, the goal was an automated dashboard that pulls nine different metrics from six very different data sources:

  • Click Thru Rate via Google Analytics

  • Time on Site via Google Analytics

  • Sessions via Google Analytics

  • Amazon AMS Spend, Goal, & Revenue via Google Sheets

  • Amazon AMG Spend, Goal, & Revenue via Google Sheets

  • Master Marketing Budget via Excel on a shared network drive

  • Social Media Followers via iPerception

  • Internet Community Brand Sentiment via Sysomos API.

  • Email Marketing Engagement and Purchasing via SFDC and Business Objects