Tableau

Why I'm leaving Alteryx behind and implementing Databricks in 2025

Try Databricks For Yourself - I’m glad I did.

Here’s my story, my reasoning, and what I can do to help you get there.

My Past: Alteryx as the Essential "Swiss Army Knife" for non-codeR Analysts

A decade ago, around 2015, I discovered Alteryx. For me, as someone who rarely coded, it was a game-changer. It was the "Swiss army knife" I needed to ingest, shape, and aggregate data for my Tableau dashboards. Back then, Tableau had rigid data structure requirements and a lot of work needed to be done before putting data into it in order to build effective reports for executives. At the time, Alteryx was the perfect tool to bridge that gap. I loved it; it was fantastic and compared to any other option, it was easily the best product on the market.  It was so good, that I made championing Tableau and Alteryx the focus of my career for the next five years.

The Present: A New Era and the Evolution of Databricks

Fast forward to today. We are in a completely different era, driven largely by advancements in large language models, but also by advancements in the ease of use in adopting the Databricks platform. They’ve done a whole lot to democratize it for all knowledge workers. 

Truly, in the past year or so, Databricks has evolved from a niche, complex tool for big data into a powerful, accessible platform for everyone. That’s a pretty bold statement, so maybe it’s helpful to share a bit about how we got here and what’s changed?


Databricks in 2022: Was a bit like a finicky "Muscle Car"

Back in the olden times of 2022 maybe even 2023 or early 2024, Databricks was primarily seen as "managed Spark." It was incredibly powerful for massive data volumes but also hugely intimidating for someone less technical, like me. It required significant tuning and configuration—much like an unreliable muscle car you wouldn't dare drive unless you were mechanically savvy. I wouldn't go anywhere near it at the time.

Databricks in 2025: The Accessible Platform

Today's Databricks is a completely different beast thanks primarily to two key changes and a lot of other smaller details that exist under that hood. To continue torturing my car metaphor, it’s like a Rivian R1T Quad Motor Launch Edition. It can haul massive amounts of data, but it also goes from standing still to 60 miles an hour in 2.5 seconds. It’s easy to drive, versatile, and deeply satisfying.

Ah, we’re still talking about Databricks, right? Yes! Yep.

The First Key Change: Databricks’ Serverless offering

Serverless, which means you don’t have to provision bundles of hardware called clusters, does the work to abstract away all the complexity of managing those clusters, sizing them, turning them off to save money, all that... I no longer have to worry about: Cluster sizing, whether a cluster is left on too long, or hemorrhaging cash due to misconfiguration.

I’ll be the first to advise that Databricks does charge a bit more for a given action when using Serverless instead of the much more complex to manage “classic” compute model, but the benefit for small shops is immense. For my mom-and-pop business, I am willing and able to use Databricks for analytics because the risk of a costly misconfiguration is drastically lower. To date, I’m about $30 dollars into my bill, except they do have a brief free window for new users like myself.

The Second Key Change: Databricks Assistant, an LLM for working with data

The advent of LLMs within Databricks for writing code is the second crucial change, and I cannot emphasize enough how powerful this is for less technical folks like myself. Compared to Alteryx, it makes dragging and dropping tools onto a canvas feel like using an abacus when everyone else has a calculator.

Now, I do have a modestly technical background:  I’ve written a little Python, some JavaScript, and a fair bit of basic SQL, so I generally know what to expect from code that I read. However, like 98% of the population, I'm terrible at actually writing code simply because I do it so rarely, maybe I write a couple hundred lines of code every few years or so.

So to be able to bust out an entirely new pipeline of code to grab a given location’s weather data from the National Weather Service in just a half hour or so, and incrementally ingest that data into a Delta Live Table, that’s pretty amazing.  

A Radically Lessened Learning Curve

Together, these two innovations have radically lessened the learning curve for adopting Databricks.  Sure, if you’re going to do this in an enterprise grade environment, you’ll probably want considerable help along the way - And that’s a big reason why companies like Indicium exist: We’re great at managing the change and reducing the friction that comes with doing large data, analytics, and AI migrations. In fact, we offer a free Alteryx-to-Databricks migration assessment for well-qualified candidates. And if you’re curious, or you’re already going down this same path, please reach out to us via the form above, or reach out to me directly, at joseph.schafer@indicium.tech today and we’ll have a chat.

But maybe you’re not there yet, maybe you’re a bit skeptical.  That’s ok, because Databricks has a free edition, so there’s nothing stopping you from checking it out for free on a personal account, just so you can confirm the bold claims I’m making here!  In fact, that’s what I did over the summer, because I was wondering how all these advancements might play out for someone like me.

Seriously, go check out the free version, here. I did this myself, all before I went out and started building out my project within Databricks for my side hustle.  I needed to see this new world, risk-free, for myself and you can too! If your organization is starting to get serious about a migration, or perhaps you’re not getting the value you expected, reach out to me. We’re here to help get you on your way to the modern data stack!

So what’s next?!

I’m so glad you Asked! Well, I’ll share a bit my own personal journey over the coming weeks, so you can get a bit more of a flavor of what the day-to-day experience is like, building out a mom-and-pop analytics hub in Databricks.

I have pretty big ambitions. I’m working on building out a weather table from the National Weather Service, because rainfall and temperature seems to have a huge impact on our foot traffic. It’ll also give me the chance to try out some of the more interesting AI features within Databricks around data science workloads.

After that, or perhaps before - my roadmap is fluid - I want to pull in inventory data. Toast’s inventory API isn’t public yet, so I’m waiting on getting access, or clarifying what access I might already have. Then I can start to focus on inventory turns, and predicting stockouts based on weather forecasts. Basically some of the coolers things that Crisp does, but for lil mom and pops, since they only play with the biggies.

Working within the constraints of Business Objects

For most folks at Seagate, the official source of truth is SAP Business Objects.  It offers a rich semantic business layer on top of our EDW and is well-governed by a talented and responsive team within our IT department.

Answering a typical business question usually entails the simultaneous consideration of multiple datasets originating out of Business Objects.  Take for example, the daily evaluation of our in-quarter shipment performance compared to prior quarter performance and two predictive data sets: our quarterly financial plan and the latest demand signal. In my world, I'd need to pull three different data sets: Actuals, Plan, Demand.  In any well-groomed dataverse, data sets such as these are maintained such that there's substantial dimensional commonality across all three.

At Seagate, this is possible from within Business Objects and for particular use cases I do provide reports directly to users from within an automated Business Objects report scheduled, exported, and emailed to users.  

That said, Business Objects and Excel reports are rarely my customers' preferred presentation layer and seldom my preferred final data preparation layer:

To the former point, my users often demand features such as mobility, interactivity, or embeddedness that the Tableau platform offers and our Business Objects environment simply does not.  

To the latter point, my team often gets asked variations on the same question and as a matter of operational efficiency, we'd all prefer to offer the same answer to the same question.  We can do that by resorting to the same Business Objects queries which we use to store the results in ephemeral staging tables that we can all reference simultaneously for any particular business case.

Since we're now in the business of building staging tables, we can also use multiple queries within the same subject matter to work around the row-count limitations imposed by both the Business Objects platform and Excel files.  We'll refresh historical quarters using big, quarterly queries, prior weeks with weekly queries, and in-week results with an hourly schedule that matches the refresh rate of our EDW platform which pulls from our transactional platform hourly.

Now, that might seem like a big jump into ominous Shadow IT territory, but consider all the benefits:

  1. We've drastically reduced unnecessary refreshes of stale data, thus diminishing the resource demands coming from our team on an aging BI infrastructure.
  2. We've de-duped queries that hitherto were sometimes prone to entropy, especially when seen only from 20,000 ft.  This also reduced demand on our BI infrastructure.
  3. Our team is standardized on augmented data tables with all eyes on the remaining queries, which are seen by myriad sets of eyes and from all kinds of angles, ensuring our source of truth's integrity.
  4. Because the staged data is augmented with a touch of metadata and available at a detailed level, we're ready to quickly answer the next business question from within our preferred presentation layer, circumventing Business Objects query times that can take up to an hour to complete.

While I'm at it, I add a bit of metadata to facilitate the refresh process and add some quality of life improvements for quicker analysis and a cleaner presentation.

Keeping things fresh: Adding [Latest Snapshot Date]

One of the key additions I make to our team's staging tables is a Time attribute called [Latest Snapshot Date], which I append to each subject matter dataset in the process of processing it from a .xlsx file that's originated out of Business Objects into a .yxdb file that's going to be used as an input for our more involved data prep workflows.

In most situations, analysts are keen to cite their sources, all the way down to the snapshot timestamp used.  This helps immensely when questions of freshness come up.

Personally, I like to provide a small table that lists each dataset and a timestamp.  This is trivial to produce in a presentation layer if each dataset is equipped with something like a [Latest Snapshot Date] attribute. I resort to two different ways to derive this attribute:

1) Transactional / Live Data

With a transactional or pseudotransactional dataset, where snapshots aren't provided, I resort to capturing the write time on the xlsx file written by Business Objects and append it as the [Latest Snapshot Date].

 

2) Snapshot Data

It's a bit different when a dataset has a series of snapshots.  I'll  use the Summarize to to find the maximum value within the snapshot field of the dataset and append that max snapshot as the [Latest Snapshot Date].

In some cases, the timestamp data is provided in a string and in such a situation I'll resort to a regex tool to convert it from a string into the time format preferred by Alteryx.  Regular expressions are a topic worthy of their own series of articles.  More on them in a future post!

 

Questionable fashion sense? Find your company's brand guide.

After all, my blog is called "questionabledecisionmaking"... So I'll confess that for almost a year I wore pants cut two inches too high because I tend to make questionable decisions, particularly in the fashion realm.

However, I do know enough about color theory that I probably shouldn't leave the walk-in closet without a consult from my better half.  To that end, your corporate overlords probably have sunk a vast sum of money in a Brand Guide and you should absolutely leverage that investment.   

As an example, Seagate's Brand Guide is a publicly available .pdf and it is a fantastically useful grimoire for those of us at Seagate do don't secretly harbor dreams of being a fashion buyer.

 

The most immediate value to me is in the color palette and hex codes.  

By referencing this graphic from the brand guide, I make any viz I've made look right when embedded in the inevitable PPT prezo or frame.  Further, it distinguishes my work from many other analysts in the company because most people simply don't take the time to recolor their work from the default palettes or perhaps aren't aware of this resource.