From Data Lineage to AI Agent: A New Chapter of Cloud Scheduling by Tianyi Cloud DolphinScheduler – ██FR█████ █INTELL███████████

This content originally appeared on DEV Community and was authored by Chen Debra

Live replay: https://www.youtube.com/watch?v=TR1B0jSsHjA

In the wave of data-driven and intelligent transformation, the value of data scheduling platforms is being redefined. The integration of Tianyi Cloud Wing MR and Apache DolphinScheduler is not just a matter of technology selection, but also a deep fusion and innovative exploration from community to enterprise.

Community Co-construction: From Usage to Contribution

The cooperation between the Tianyi Cloud team and the Apache DolphinScheduler community has a long history. In addition to deep usage in production environments, team members actively participate in community building through PR submissions, issue feedback, feature suggestions, and more to drive project iteration.

Some contribution examples:

PR #17037: Optimized task execution logic

PR #17165: Added log retrieval client

This two-way interaction not only makes the platform better fit real business needs but also brings the community authentic feedback from frontline production.

Wing MR + DolphinScheduler: A Stable Foundation for Cloud Big Data

As Tianyi Cloud’s big data computing platform, Wing MR provides users with ready-to-use, secure and reliable, and easy-to-manage public cloud services.

More importantly, Wing MR achieves automatic integration with big data components. When scheduling tasks in DolphinScheduler, users can directly call Hive, Spark, Flink, and other components without tedious environment configuration.

In addition, the platform comes with a monitoring and operations system optimized based on Apache DolphinScheduler, providing full-link visual management for task execution, helping O&M teams quickly locate issues and optimize performance.

Secondary Development: Making Scheduling Closer to Business

In production environments, the Tianyi Cloud team has made multiple customized optimizations based on DolphinScheduler:

Standardized Result Sets
Unified node output into CSV format to reduce memory usage and improve disk write efficiency.
Full-link Lineage Tracking
Visualize the complete data path from collection and processing to output, supporting dependency analysis and auditing between tasks.

By leveraging three parsing engines—SQLLineage4J, SQLLineage, and GSP—all kinds of SQL scripts are parsed into lineage relationships and stored in the metadata center. These lineage data are then consumed by downstream modules such as the data catalog and data services, enabling visualization, search, and impact analysis of data assets.

Integration with Third-party Task Systems Supports graphical registration to connect third-party task systems (such as Dinky, AWS EMR) and allows direct task submission via OpenAPI, reducing integration development costs.

These optimizations significantly improve the flexibility and maintainability of task scheduling and provide a unified scheduling hub for different types of business.

Outlook for the AI Era

Finally, I’d like to share my vision for DolphinScheduler in the era of Agentic AI. In the Agentic AI era, users’ expectations—or even their usage patterns—of DolphinScheduler are changing. For example, in the June Meetup, the community shared its exploration of MCP (Model Context Protocol) for DolphinScheduler, which means that in the future, the “users” of DolphinScheduler may extend from human developers to AI Agents.

Perhaps in the near future, the target users of DolphinScheduler will be agents, and the scheduling platform will no longer be just a web console for humans, but a part of AI workflows. I believe DolphinScheduler has a natural advantage here. While various big data components are building their own MCP Servers, as a platform that already integrates almost all big data components, it’s more efficient, easier to maintain, and lighter to have DolphinScheduler serve as the unified scheduling entry rather than letting each component develop its own MCP Server.

The vision is ambitious, but how can we bring it to life? I believe that, as suggested by Dai Lidong in issue #17334, introducing large model capabilities is the first step toward this future. As a member of the community, I call on everyone to actively join this topic of exploration, pool our ideas, and work together to turn our vision into reality as soon as possible.

Final Words

The combination of Tianyi Cloud Wing MR and Apache DolphinScheduler is a two-way journey between community and enterprise, and also a new reflection on the role of scheduling platforms in the AI wave. In the future, with the development of Agentic AI, this combination will unlock value in even more scenarios.

If you’re interested in big data scheduling and AI workflows, you’re welcome to join the Apache DolphinScheduler community via our WeChat official account, Slack, or GitHub to explore more possibilities in data scheduling together.

This content originally appeared on DEV Community and was authored by Chen Debra