• Bide Bullpen
  • Posts
  • August Homeruns Special Edition โšพ๏ธ๐Ÿ’ฅ

August Homeruns Special Edition โšพ๏ธ๐Ÿ’ฅ

Japanese for โ€œthank youโ€, you ask? ใ‚ใ‚ŠใŒใจใ† (arigatou). Thank me later!

Some fresh brews โ˜•โœจfrom:

  • Microsoft including ability to disable V ordering on Fabric Warehouses,

  • Google, launching Gemini in BigQuery, and

  • Databricks, announcing general availability of Looping for Tasks in todayโ€™s edition.

Todayโ€™s reading time is 5 minutes.

Data Engineering ๐Ÿ› ๏ธ๐Ÿ“Š

Microsoft Fabric

Microsoft has launched the capability to toggle V-Order behavior on or off for Fabric Warehouses. This feature might help improve ETL performance in staging warehouses, where tables are frequently dropped, by avoiding lags associated with implementing V-ordering while writing tables.

Credit: Microsoft

Why it matters ๐Ÿ’ก: In staging environments where fast ETL is crucial, turning off V-ordering could lead to improvements in write performance. However, keep in mind -

  • You wonโ€™t be able to turn it back on,

  • It applies to the entire warehouse and not to specific tables,

  • Consider separating your staging and consumption warehouses as V-ordering provides cost efficiency and performance in read scenarios, which maybe relevant for the latter.

Read more ๐Ÿ”—๐Ÿ‘‰ here.

Google BigQuery

Gemini is now generally available in BigQuery, offering features like assisted data exploration, natural language-based SQL and Python code generation, assisted analytics workflows, and partition and clustering recommendations using AI.

Credit: Google

Why it matters ๐Ÿ’ก: With this release, Google has brought parity with its competitors (Databricks Assistant, Snowflake Copilot, and Microsoft Copilot), who have smart assistants to optimize analytical journeys. While broad based use cases remain undiscovered as organizations still try to figure out how best to deloy, AI powered assistants are ushering in more people to try their hands at self service analytics using SQL and Python.

Check this feature out ๐Ÿ”—๐Ÿ‘‰ here.

Databricks

Databricks has introduced looping for Tasks in Databricks Workflows. This feature helps reduce complexity of repetitive processing tasks (for instance, ingestion of sales data from multiple stores). Previously, engineers had to repeat transformation logic for each ingestion, but with for Each loops, this can be achieved in a single loop that processes in parallel. There is also dynamic parameter handling to support varying schemas.

Credit: Databricks

Why it matters ๐Ÿ’ก: This small, yet convenient feature release now allows for schema management with dynamic parameter handling, concurrent processing to speed up processing, and simplified workflow management. Highly recommended to check this one out!

Read more here ๐Ÿ”—๐Ÿ‘‰ link.

Enjoying reading the latest in the data world? Consider subscribing and supporting!

Feedback? Email us at: [email protected]

Reply

or to participate.