My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at UC Berkley's computer science department. Our conversation ranges from the origins of distributed computing to modern data infrastructure, how companies can leverage their massive datasets, and the transformation of Databricks through its phases of growth as a business. While technical, it's exactly the kind of conversation I like to have on this show. I hope you enjoy my conversation with Ali Ghodsi.
For the full show notes, transcript, and links to mentioned content check out https://www.joincolossus.com/episodes/4919706/ghodsi-the-past-present-and-future-of-big-data
This episode of Founder's Field Guide is sponsored by Klaviyo. Klaviyo is the ultimate marketing platform for ecommerce.
With targeted segmentation, email automation, SMS marketing, and more, Klaviyo helps you create your ideal customer experience. See why Klaviyo's trusted by more than 50,000 brands, like Living Proof, Solo Stove, and Nomad to help them grow their business.
For a free trial check out https://www.klaviyo.com/founders.
This episode is also sponsored by Vanta. Vanta has built software that makes it easier to both get and maintain your SOC 2 report, at a fraction of the normal cost. Founders Field Guide listeners can redeem a $1k off coupon at vanta.com/patrick.
Founder's Field Guide is a property of Colossus Inc. For more episodes of Founder's Field Guide go
Stay up to date on all our podcasts by signing up to Colossus Weekly, our quick dive every Sunday highlighting the top business and investing concepts from our podcasts and the best of what we read that week. Sign up here - https://www.joincolossus.com/newsletter.
Follow Patrick on Twitter at @patrick_oshag
Follow Colossus on Twitter at @JoinColossus
[00:02:48] – [First question] – What is Databricks
[00:03:34] – History of distributed computing
[00:05:35] – Hardware that made this all possible
[00:07:20] – Early challenges in building out these systems
[00:09:43] – What has made networking technology better
[00:10:35] – Doing something in storage vs with memory
[00:11:45] – Origins of Hadoop
[00:12:42] – Use cases of distributed data in 2010 that weren’t possible in 2000
[00:13:35] – Origins of Spark
[00:15:25] – Early Spark and then the transformation into Databricks
[00:16:50] – Early uses cases
[00:17:37] – Their relationship to the open-source project
[00:21:07] – What customers need in order to work with Databricks
[00:23:11] – Their customer interaction
[00:26:27] – How they think about making investments
[00:28:24] – Their competitive advantage
[00:30:13] – Other companies in moving the needle in building distributed computing industry
[00:32:10] – Walls that need to be broken down today
[00:34:02] – Best practices for companies when it comes to their data
[00:34:13] – Jeff Lawson Podcast Episode
[00:38:47] – Lessons being a CEO
[00:39:53] – Working at the University of Berkeley’s AMPLab
[00:41:56] – What excites him about the future
[00:43:29] – Kindest thing anyone has done for him