Maple Grove Report

Maple Grove Report

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.


Excel makes it easy to turn numbers into visuals, but “easy” doesn’t always mean “accurate.” Some chart types do more to hide your data than highlight it. If you want your reports to be readable and professional, stop using these confusing layouts right away.

Stop using pie charts for complex comparisons

Circular data is a recipe for confusion

A pie chart titled Expenses showing seven unsorted segments. Similar slice sizes for Rent & Utilities and Payroll make comparing their values difficult without labels.

We’ve all been there. You have a handful of categories, and you instinctively click that colorful circle in the Insert tab. It looks friendly, it’s classic, and it feels like the right thing to do. However, the pie chart is one of the most misused tools in the spreadsheet kit.

The fundamental issue is biological: the human brain is surprisingly bad at judging angles and comparing areas. Statisticians Cleveland and McGill’s study on graphical perception shows that we are much better at comparing the heights of two bars—”a position along a common scale”—than figuring out whether one slice is slightly larger than another. When values are close, a pie chart turns into interpretation guesswork because “it fails if the decoding process fails.”

The problem worsens as you add categories. Beyond three or four segments, you end up with a cluttered ring of slivers, requiring labels, leader lines, and a legend just to decode what you’re looking at. At that point, the visual has simply stopped communicating. Pie charts only work reasonably when you have very few categories (usually two or three) and the differences are obvious at a glance.

What to use instead: Use a bar chart when comparing categories, especially when values are close or labels are long. Use a column chart when you want to compare a few categories side by side in a simple ranked or time-based order. Their straight baselines make comparison immediate and effortless. If you still need a proportional comparison, use a sorted bar chart and display percentages directly on the bars instead of relying on angles.

Stop using 3D charts for visual depth

Depth makes your data harder to read

A 3D clustered column chart titled Q1 Sales (Actual) showing sales for four regions.

If pie charts are misleading, 3D charts are actively deceptive. They might look impressive, but they introduce perspective distortion that changes how data is perceived.

In a 3D column or bar chart, elements closer to the viewer appear larger than those further away, even when values are identical. Gridlines become harder to read, and exact alignment with axis values becomes unclear. You’re no longer reading data—you’re interpreting an angle. This violates a core principle of data pioneer Edward Tufte: “The number of information-carrying dimensions depicted should not exceed the number of dimensions in the data.” In practice, this means any chart element that adds visual structure beyond the data itself should be treated with caution.

This creates accidental bias. A dip can look less severe, or a spike more dramatic, simply because of how Excel renders perspective.

What to use instead: Use flat 2D charts for all standard comparisons. If you need emphasis, highlight specific data points with color, annotations, or data labels instead of adding perspective effects.


An Excel spreadsheet with a 3D chart and the Excel logo.


The 6 Best Tips For Formatting Your Excel Charts

Don’t rely on Excel’s default chart styles.

Stop using dual-axis charts for comparisons

A dual-axis Excel line chart titled Website Visits vs. Marketing Spend comparing two datasets with different scales.

Dual-axis charts seem efficient: two datasets, one chart. In reality, they’re one of the easiest ways to imply relationships that don’t actually exist.

The problem isn’t that dual-axis charts can’t show relationships—it’s that they encourage false visual correlation when scales aren’t directly comparable. By placing two different scales on the same graph—say revenue in millions and customer satisfaction out of 10—you can visually engineer correlations simply by adjusting axis ranges. Small changes on one axis can look dramatically significant next to another series.

Even when used honestly, they are difficult to read. The viewer has to constantly check which axis applies to which line, breaking the flow of interpretation. It forces the eye to bounce between the left and right margins, creating unnecessary cognitive load.

What to use instead: Use small multiples (separate, identical charts shown side by side) when you want to compare trends using the same visual scale. Use separate charts when the datasets use different units and should be interpreted independently.

Stop using area charts for overlapping data

Overlapping series hide important values

Area charts can be effective for showing part-to-whole relationships over time, but they break down quickly when multiple overlapping series are used.

The problem is occlusion. Front layers cover what’s behind them, making smaller datasets difficult—or impossible—to see clearly. Even transparency doesn’t fully solve the issue, since overlapping colors create new, unintended shades that distort interpretation.

Stacked area charts aren’t inherently bad—they shift the focus from individual values to the combined total. But if you care about the breakdown, they stop being useful very quickly, often resulting in a chart that is visually appealing but fails to communicate accurate comparisons.

What to use instead: Use line charts when comparing multiple trends over time, especially when individual series matter equally. Only use stacked area charts when the total combined value is more important than the breakdown of each series.


Some Excel cells with mini charts and the Excel logo in the center.


2 Ways to Quickly Visualize Your Excel Data Without Using Extra Space

Data bars and sparklines are the way forward!

Stop using radar charts for comparisons

Radial metrics distort interpretation

An Excel radar chart comparing three individuals across five metrics. The overlapping colored lines create a complex spider web effect, making it difficult to interpret.

Radar charts look analytical, but they’re one of the hardest chart types to interpret correctly.

They plot values along radial axes, forming a shape that’s visually compelling but mathematically difficult to compare. Humans are poor at consistently judging radial distance, making comparisons between axes unreliable even when the data is accurate.

Worse, the shape itself is misleading. A larger-looking polygon doesn’t necessarily represent higher overall values—it just reflects how values extend across different axes. This encourages false pattern recognition, and when multiple datasets are added, the chart quickly becomes unreadable, with overlapping shapes creating visual noise rather than insight.

What to use instead: Use grouped bar charts when comparing multiple variables across the same categories. Use small multiples when each metric needs to be read independently without geometric distortion.


Avoiding these chart types keeps your data readable, accurate, and easy to compare—the three things that matter most in any spreadsheet report. And ditching swanky 3D blocks doesn’t mean your reports have to be boring—once you have a clean 2D layout, you can modify traditional charts for a more tailored, professional look. For example, you can use a line chart to build a dynamic timeline, or even use graphics as the columns in column charts. The goal is clarity over decoration, so your insights stand out without visual interference.

OS

Windows, macOS, iPhone, iPad, Android

Free trial

1 month

Microsoft 365 includes access to Office apps like Word, Excel, and PowerPoint on up to five devices, 1 TB of OneDrive storage, and more.




Source link


Data quality has always been an afterthought. Teams spend months instrumenting a feature, building pipelines, and standing up dashboards, and only when a stakeholder flags a suspicious number does anyone ask whether the underlying data is actually correct. By that point, the cost of fixing it has multiplied several times over.

This is not a niche problem. It plays out across engineering organizations of every size, and the consequences range from wasted compute cycles to leadership losing trust in the data team entirely. Most of these failures are preventable if you treat data quality as a first-class concern from day one rather than a cleanup task for later.

How a typical data project unfolds

Before diagnosing the problem, it helps to walk through how most data engineering projects get started. It usually begins with a cross-functional discussion around a new feature being launched and what metrics stakeholders want to track. The data team works with data scientists and analysts to define the key metrics. Engineering figures out what can actually be instrumented and where the constraints are. A data engineer then translates all of this into a logging specification that describes exactly what events to capture, what fields to include, and why each one matters.

That logging spec becomes the contract everyone references. Downstream consumers rely on it. When it works as intended, the whole system hums along well.

Before data reaches production, there is typically a validation phase in dev and staging environments. Engineers walk through key interaction flows, confirm the right events are firing with the right fields, fix what is broken, and repeat the cycle until everything checks out. It is time consuming but it is supposed to be the safety net.

The problem is what happens after that.

The gap between staging and production reality

Once data goes live and the ETL pipelines are running, most teams operate under an implicit assumption that the data contract agreed upon during instrumentation will hold. It rarely does, not permanently.

Here is a common scenario. Your pipeline expects an event to fire when a user completes a specific action. Months later, a server side change alters the timing so the event now fires at an earlier stage in the flow with a different value in a key field. No one flags it as a data impacting change. The pipeline keeps running and the numbers keep flowing into dashboards.

Weeks or months pass before anyone notices the metrics look flat. A data scientist digs in, traces it back, and confirms the root cause. Now the team is looking at a full remediation effort: updating ETL logic, backfilling affected partitions across aggregate tables and reporting layers, and having an uncomfortable conversation with stakeholders about how long the numbers have been off.

The compounding cost of that single missed change includes engineering time on analysis, effort on codebase updates, compute resources for backfills, and most damagingly, eroded trust in the data team. Once stakeholders have been burned by bad numbers a couple of times, they start questioning everything. That loss of confidence is hard to rebuild.

This pattern is especially common in large systems with many independent microservices, each evolving on its own release cycle. There is no single point of failure, just a slow drift between what the pipeline expects and what the data actually contains.

Why validation cannot stop at staging

The core issue is that data validation is treated as a one-time gate rather than an ongoing process. Staging validation is important but it only verifies the state of the system at a single point in time. Production is a moving target.

What is needed is data quality enforcement at every layer of the pipeline, from the point data is produced, through transport, and all the way into the processed tables your consumers depend on. The modern data tooling ecosystem has matured enough to make this practical.

Enforcing quality at the source

The first line of defense is the data contract at the producer level. When a strict schema is enforced at the point of emission with typed fields and defined structure, a breaking change fails immediately rather than silently propagating downstream. Schema registries, commonly used with streaming platforms like Apache Kafka, serialize data against a schema before it is transported and validate it again on deserialization. Forward and backward compatibility checks ensure that schema evolution does not silently break consuming pipelines.

Avro formatted schemas stored in a schema registry are a widely adopted pattern for exactly this reason. They create an explicit, versioned contract between producers and consumers that is enforced at runtime and not just documented in a spec file that may or may not be read.

Write, audit, publish: A quality gate in the pipeline

At the processing layer, Apache Iceberg has introduced a useful pattern for data quality enforcement called Write-Audit-Publish, or WAP. Iceberg operates on a file metadata model where every write is tracked as a commit. The WAP workflow takes advantage of this to introduce an audit step before data is declared production ready.


Data-quality-graph

In practice, the daily pipeline works like this. Raw data lands in an ingestion layer, typically rolled up from smaller time window partitions into a full daily partition. The ETL job picks up this data, runs transformations such as normalizations, timezone conversions, and default value handling, and writes to an Iceberg table. If WAP is enabled on that table, the write is staged with its own commit identifier rather than being immediately committed to the live partition.

At this point, automated data quality checks run against the staged data. These checks fall into two categories. Blocking checks are critical validations such as missing required columns, null values in non-nullable fields, and enum values outside expected ranges. If a blocking check fails, the pipeline halts, the relevant teams are notified, and downstream consumers are informed that the data for that partition is not yet available. Non-blocking checks catch issues that are meaningful but not severe enough to stop the pipeline. They generate alerts for the engineering team to investigate and may trigger targeted backfills for a small number of recent partitions.

Only when all checks pass does the pipeline commit the data to the live table and mark the job as successful. Consumers get data that has been explicitly validated, not just processed.

Data quality as engineering practice, not a cleanup project

There is a broader point embedded in all of this. Data quality cannot be something the team circles back to after the pipeline is built. It needs to be designed into the system from the start and treated with the same discipline as any other part of the engineering stack.

With modern code generation tools making it cheaper than ever to stand up a new pipeline, it is tempting to move fast and validate later. But the maintenance burden of an untested pipeline, especially one feeding dashboards used by product, business, and leadership teams, is significant. A pipeline that runs every day and silently produces wrong numbers is worse than one that fails loudly.

The goal is for data engineers to be producers of trustworthy, well documented data artifacts. That means enforcing contracts at the source, validating at every stage of transport and transformation, and treating quality checks as a permanent part of the pipeline rather than a one time gate at launch.

When stakeholders ask whether the numbers are right, the answer should not be that we think so. It should be backed by an auditable, automated process that catches problems before anyone outside the data team ever sees them.

 



Source link

Recent Reviews