The $2 trillion AI infrastructure problem no one is talking about, and the engineer solving it


The AI infrastructure earnings calls of the past eight quarters have given the public a precise vocabulary for what the build-out costs in capital. Hyperscaler GPU procurement. Power purchase agreements. Real-estate footprints. The vocabulary they have not given the public is for what it costs to keep the clusters healthy on a recurring basis after the capital is spent. That line item, on close inspection, has become one of the largest hidden cost centers in the entire build-out. It is growing faster than the capital line above it.

The visible numbers in the AI infrastructure conversation describe the capital story. Hyperscaler GPU procurement is on track to cross multi-trillion-dollar cumulative spend over the current cycle. Power purchase agreements have moved into the range that historically described heavy industry. Real-estate commitments have followed. The capital narrative has been told in detail across two years of investor updates.

The operational story is less visible. It describes what it costs to keep the clusters healthy. The work is unglamorous and largely manual. GPU node failures have to be detected, triaged, and remediated. Pods have to be rescheduled around degraded hardware. Resource utilization across an accelerator fleet has to be monitored, balanced, and reported on. Each of these tasks is, in current production environments, performed by a class of engineer whose compensation is among the highest in the industry.

The scale of the bill is enormous. Industry analysts who track GPU utilization across hyperscaler fleets have, for several years, reported routine idle rates above thirty percent on production accelerators. The headcount required to keep cluster operations running has scaled with cluster size, in proportion rather than sub-proportion, in environments where the explicit goal of every infrastructure team is to break that proportionality. The operational layer, on aggregate, is one of the line items that turns the AI infrastructure thesis from a strong investment story into a structural margin problem.

The 💜 of EU tech

The latest rumblings from the EU tech scene, a story from our wise ol’ founder Boris, and some questionable AI art. It’s free, every week, in your inbox. Sign up now!

The work to address it has, until recently, sat inside the bespoke automation tooling of the largest operators, accessible only to the engineers who built it. That is starting to change. Shashidhar Bhat, a software engineer in the big-data infrastructure organization at ByteDance, has spent the past two years producing a body of work that maps directly onto the operational layer the rest of the industry has been describing as a problem.

The pieces, individually, look like ordinary infrastructure components. Custom device plugins for finer-grained accelerator scheduling. Observability tooling built on top of NVIDIA’s Data Center GPU Manager. Autonomous pod rescheduling logic that reacts to hardware degradation without human escalation. Each is the kind of thing that gets shipped quietly inside an internal infrastructure team. Taken together, they describe the operational layer that the industry has been outsourcing to site reliability engineers, ported into software and hardened against production load.

The scale at which Bhat’s work runs is part of what makes it credible as a reference architecture. ByteDance, parent of TikTok, operates one of the largest Kubernetes deployments in the world. Its clusters run on hundreds of GPU nodes processing roughly one petabyte of data each month. Bhat’s internal framework, an agent-based automation system called OpenSkill, has reduced GPU idle time by thirty-five percent across that environment, against a baseline that included the usage spikes characteristic of large-scale recommender training and content distribution.

A thirty-five percent figure is, by the operational standards of the field, large. Hyperscaler-class operators have for years been chasing single-digit-percentage improvements in idle rates, on the reasoning that single-digit improvements at hyperscaler volumes pay back in eight figures. A reduction at the scale Bhat reports is the kind of result that, when it appears in production at a peer company, is closely held. The fact that it has been reported at all is part of why the wider operator community has begun paying attention.

The other half of Bhat’s recent work has appeared on the open-source side. He has been a contributor to Kubewharf Katalyst, the resource management framework maintained jointly by ByteDance and the broader Kubernetes community. The Katalyst project is one of the few in the cloud-native ecosystem to address the joint scheduling of CPU and GPU resources under load. The design proposals Bhat has filed against the project have moved the discussion in directions that closely parallel his internal work. The convergence between an engineer’s internal production work and external open-source contributions is the rare kind of pattern the maintainer community recognizes as substantive rather than promotional.

The third leg of the body of work is Carbon-Kube, the open-source Kubernetes scheduler Bhat released this past December alongside an IEEE paper co-authored with Sathwik Rao Sirikonda, also at ByteDance. The scheduler is a distinct project from his internal ByteDance work and addresses the carbon-emissions dimension of cluster operations rather than the headcount dimension. The project ships with a citation file, a published benchmark methodology, and reproducible scripts. The contribution is methodologically rigorous in a way that most internal infrastructure tooling never bothers to be.

The combined picture is what makes the case worth making at the industry level. The AI infrastructure operational layer is a cost center the size of a medium economy. The work to address it has been happening quietly inside the largest companies, accessible only to their internal teams. That is changing, in part because of the work of operators like Bhat, whose contributions span internal production deployments, external open-source maintenance, and research-grade publications under his own name.

The argument that the operational layer is the next major margin frontier in AI infrastructure is, on the strength of the work that has shipped in the past year, hard to dismiss. Cluster operators in the next two to three years will need to decide whether to build their own answer or to adopt one of the open-source ones now becoming available. The composition of that answer will reshape the operational margin of every team running production AI workloads.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


You’ve built your small business from the ground up. It’s your pride and joy, your financial security, and a potential legacy for your family. But what happens to your business interests after you’re gone? Without proper estate planning, your small business could face a chaotic future, disrupting operations, hurting employees, and jeopardizing your loved ones’ inheritance.

Business estate planning is your secret weapon. It’s not just for the ultra-wealthy with complex trusts and wills. For small business owners, it’s a crucial tool to ensure business continuity and protect your business value. Here’s how you can craft a comprehensive estate plan:

Know Your Business Inside and Out

The first step in your estate planning process is taking a deep dive into your business affairs. Make a list of all your business assets: equipment, inventory, intellectual property, and real estate.

Furthermore, don’t forget your business debts like loans and outstanding payments. This comprehensive list helps you understand what needs protecting and planning for in your estate planning documents.

Chart Your Business’s Future Course

What do you envision for your business after you’re gone? Should it stay in the family? Be sold to a trusted partner? Wind down entirely? This is where business succession planning comes in. It’s about deciding the future of your business in a way that honors your legacy and sets your team up for success.

Here are some questions to consider:

  • Family Business? Do you have a family member who shares your passion and has the skills to lead?
  • Trusted Partner? Is there a key employee you see as the ideal successor?
  • Time for a Change? Are you open to selling the business to ensure a smooth transition?

There’s no right or wrong answer. The key is to have open conversations with your loved ones and key employees to understand their goals and aspirations. This will guide you in crafting a business succession plan that feels right for everyone involved.

Develop a Rock-Solid Business Succession Plan

This plan outlines who will take over your business and how. You might identify a family member, a key employee, or even an outside buyer. The business succession plan should detail the transfer process, including training and timeline.

Here’s how to craft a plan as strong as your business itself:

  • Identify Your Successor: It could be a family member you’ve been mentoring, a trusted key employee, or even an outside buyer.
  • Groom Your Successor: Start by involving them in key decisions to give them opportunities to learn the ropes.
  • Plan for the Unexpected: Have a backup plan in place. Identifying another potential leader or outline a buy-out option for remaining partners.

An experienced estate planning attorney like Keele & Parke can help you draft a legally sound plan that considers state law and tax implications.

Avoid Conflict with Ironclad Sell Agreements

If you have co-owners, a sell agreement is vital. This agreement dictates what happens to a deceased or incapacitated owner’s share of the business. It prevents conflict among remaining partners and ensures a smooth ownership transition in your overall estate plan.

Wills vs. Trusts: Choosing the Right Tool

A will can designate who inherits your business assets. But the problem is it can be a slow and public process through probate court.

Here’s where a revocable living trust comes in. Think of it as a private vault that holds your business assets during your lifetime. You can name yourself as trustee, so you’re still in control.

Another thing, you can designate a successor trustee to seamlessly take over managing the business if you become disabled or pass away. This avoids probate and keeps things running smoothly for your loved ones and your employees.

Wills are still important for your overall estate plan, especially for personal assets outside the trust. But for your business, a revocable living trust offers flexibility, privacy, and peace of mind.

Minimize Estate Taxes Through Strategic Planning

Nobody wants a big chunk of their hard-earned business value going to the government after they’re gone. That’s where estate taxes come in, and they can be a real burden for your family. But don’t worry, there are smart estate planning strategies you can use to minimize the impact of these taxes.

  • Smart Business Structure: The legal entity you choose for your business can impact your estate taxes. Talk to your estate planning attorney about structuring your business as a limited liability company (LLC) or another entity that might offer tax advantages.
  • Explore Powerful Trusts: There are special types of trusts, like grantor retained annuity trusts (GRATs), that can be used to transfer ownership of your business interests to your heirs while minimizing the taxable value of those assets.

The right strategy for you will depend on your specific situation and goals. That’s why it’s crucial to work with an experienced estate planning attorney and financial advisor. They can help you create a personalized plan that minimizes your estate taxes and protects your legacy.

Don’t Neglect Your Personal Estate Plan

Your business is just one piece of the puzzle. You also need a personal estate plan that includes a will, power of attorney, and healthcare directives. Without it, your loved ones could face a legal mess during tough times. Bills might go unpaid, important decisions could be delayed, and family heirlooms could end up in the wrong hands.

An estate plan ensures your wishes are followed. It names guardians for your minor children, designates beneficiaries for your personal assets (like your home and savings), and appoints someone you trust to make healthcare decisions if you’re unable to. This gives your family peace of mind knowing they’re taken care of, even in your absence.

Life Insurance: A Lifeline for Your Loved Ones

A life insurance policy provides your beneficiaries with a lump sum of cash upon your death. This can be crucial for surviving family members or business partners, especially if they need to buy out another owner’s share through a sell agreement or pay estate taxes.

Regularly Review and Update Your Plan

Life circumstances change, and so should your estate plan. Regularly review your plan, especially after major life events like marriage, children, or changes in your business structure.

Seek Professional Guidance for a Comprehensive Plan

Business estate planning involves complex legal and financial considerations. Don’t try to go it alone. Consult with an experienced estate planning attorney specializing in business succession planning and a financial advisor with experience in small business matters. Their expertise can ensure your estate plan is comprehensive, legally sound, and achieves your goals for business continuity and protecting your loved ones.

Final Thoughts

Safeguarding your business is like protecting your family’s future. Take control. Schedule a consultation with an experienced estate planning attorney today. They’ll guide you through the process and ensure your legacy lives on.



Source link