DuckDB’s Innovative Approach
DuckDB operates by referencing data stored separately, without including actual data in the database files. This allows the database file to contain only the rules on how to process the data, making database management and data sharing much easier.
Example of a Robo-taxi Service
To better understand, let’s consider a robo-taxi service. Imagine you need to share large amounts of data generated daily with analysts. When the data is too large to send via email and too cumbersome to share via links, DuckDB comes in handy.
Creating a Database File
With DuckDB, you can create a database file and easily share it. Here’s a simple example code:
import duckdb
db = duckdb.connect("weird_rides.db")
db.sql("""
CREATE VIEW weird_rides
AS SELECT pickup_at, dropoff_at, trip_distance, total_amount
FROM 's3://robotaxi-inc/daily-ride-data/*.parquet'
WHERE fare_amount > 100 AND trip_distance < 10.0
""")
db.close()
This code creates a file named `weird_rides.db`, which includes the rules on how to process the data but not the actual data.
Sharing and Accessing Data
Upload the created file to a blob storage and share the link. The recipient can start a local DuckDB session and connect to the shared database file:
import duckdb
conn = duckdb.connect()
conn.sql("""
ATTACH 's3://robotaxi-inc/virtual-datasets/weird_rides.db'
AS rides_db (READ_ONLY)
""")
conn.sql("SELECT * FROM rides_db.weird_rides LIMIT 5")
This way, only the necessary data is downloaded, allowing efficient data processing.
Advantages of DuckDB
DuckDB has various advantages, especially in handling data formats, partitioning strategies, and schema changes. This is because its approach to data access does not change, offering great flexibility in data management and use.
DuckDB as a Data Cloud Browser
Using DuckDB, relational datasets can be easily accessed through hyperlinks. This provides significant benefits for managing and using data in the cloud. For example, you can easily query updated data through DuckDB when new data is added or existing data is modified.
Conclusion
DuckDB is ushering in a new era for databases. It provides powerful features that allow it to function like a database without actually storing data. We encourage you to try DuckDB and experience its revolutionary approach to data management. We hope this article has conveyed the appeal of DuckDB and encourages you to use it in practice.
Reference: nikolasgoebel, “DuckDB Doesn’t Need Data To Be a Database”