Nuts & Bolts of Tally Filesystem: Embedded Indexing

To simplify the understanding of a software product, imagine it is like a machine which runs due to coordination of multiple moving parts. Sometimes there can be multiple engines powering each part so that the machine can work efficiently and can also be maintained or scaled without impacting its existing functionality. One such engine that runs Tally is Tally filesystem or tally database engine. We would like to discuss one of the nuts & bolts of this tally database engine to peek into what happens when you drill down to one of the business reports in tally.

Wait! wait! did you say Tally filesystem, is tally an application or OS to have its own filesystem. That’s why I said it as a filesystem or a database engine. Let’s not fret on the terminology much, towards the end of this article whatever you want to call it, call it.

What is tally database engine?

What is the difference between the operating system’s filesystem and this filesystem?

The story of the storage before the actual story

Too many STO’s!!! Hey, don’t waste my time! Let’s get in.

Data storage has been there as a computing need from the start of computers as a utility. Still there has always been a need for a new DB or a new filesystem in the industry. What is the need of having variety of databases when data storage is a very generic computing need that has been there from ages.

The operating system’s filesystem provides the functionality that takes care of organization of the drives [C: drive, D: drive], folders in that drive, files inside that folder, blocks of data inside that file. The OS filesystem operates at a high level of data organization, and it does not have the quantum view of what is the nature of data that is getting stored. Windows, the operating system works using NTFS, the file system works with drives, folders, files, blocks.

Does the operating system know what is inside the files? an absolute clean NO, NTFS knows a file’s name is ‘something.txt’ but it does not know that it is a text file, it knows a file name is ‘something.xlsx’ but it does not know that it is a excel file. NTFS knows the size of the file, last updated time, is the file hidden or not? sharing permissions of the file/folder and much more details needed to operate at a file or folder level.

Similarly, the operating system doesn’t understand the data in the files that is in your tally company folder with ‘.900’ and ‘.1800’ extensions.

when you pass a voucher for your business purpose in tally it gets stored in the ‘.1800’ files inside the company folder with its unique company number. One such voucher gets created into records on the disk. Sometimes as a single record, sometimes as multiple records. A record is a basic data block of variable length which stores the data that tally wants to refer at the time of need on the business user’s command, in business terms when you want to see your reports or drill down to a voucher or a master.

If we can imagine OS filesystem as a manager who has an eagle's view of the data, tally filesystem is like a manager having both eagle's view and the ant's view of the data as per product's usage. It can go down the puzzling paths of the ant hill and fetch or store the data that you demanded to refer in the most efficient way possible. Tally filesystem is the master of the tally company folder and its contents, and it works as per the business user demands.

“Tally filesystem knows what is the data that it stores; what is inside the data; when is it needed; how is it needed;”

A record will have below considerations / variations:

The data Access patterns in which record will be accessed.
The lifecycle of the record in memory/RAM.
The chunk of data which will be accessed together.

Tally, having its own proprietary filesystem gives the product the unique freedom of deciding on the design and implementation of various types of records based on the above-mentioned considerations/Variations.

The tally DB is one of the multiple phoenixes of the tally product, that keeps rediscovering itself to keep pace with the raising requirements of the ever-emerging Tally product to simplify how you run your businesses.”

If we understood till here, as they say then ‘remaining part is a cake walk’.

Huddle up your focus again and now let’s see any one such data access pattern and its respective data structure running in your Tally.

Embedded indexing

To keep things in perspective Let's start where we ended: When you pass a voucher, Tally file system stores such an entity like voucher as one record or multiple records based on its type; a voucher will have data in numbers like amount, multiple masters, stock items as line entries; a voucher will be needed sometimes individually for an alteration or sometimes will be needed as a single line entry in a list or collection of vouchers to be displayed on a given day or month or a range of dates in the day book.

In day-to-day operations we will keep creating vouchers, altering vouchers, deleting vouchers, cancelling vouchers. All such operations will need to be updated on the disk.

When we delete vouchers, it will create empty spaces on the disk where data is deleted.

When we alter a voucher, an alteration can led to voucher record becoming smaller or bigger. If the altered voucher is bigger with added entries of stock items or line entries, the location in which the voucher resided may not be enough now and it requires new location to be stored. If the altered voucher is smaller with removed entries of stock items or line entries, the location in which the voucher resided will have empty space created which will unnecessarily increase the file size if not reused properly.

To optimally make use of disk space, vouchers are not necessarily located contiguously in their order of creation, but they are located as per the availability of disk space. As we do more and more such rearrangement of records, there arrangement as per their date of creation keeps becoming more and more haphazard. But then when we want to display the voucher/s of a particular day, or a particular range of days, how do we do it with the highest speed, so that the business user doesn’t need to wait too much to check his daybook.

Let’s go with an analogy to better understand the solution:

Imagine that the records of your data are like houses in a city or a society. When a wedding happens the bride’s/bridegroom’s family makes sure that their relatives, relatives to relatives, friends, relatives of friends attend the wedding. Don’t get me wrong, the intention of the hosting family is to get as many people’s blessings as possible to the newlyweds.

So, in the process of inviting people, the members of the hosting family visit each of their acquaintances houses and give personal invitations. Though the hosting family knows their acquaintances very well in their office spaces, sports clubs, daily commutes, still they may not be knowing where they stay. But most of these acquaintances will be so formed that some of these acquaintances know some of their houses who stay nearby to each other, and the hosting family knows one of their houses. Now what practically happens is one of the members of hosting family will visit the house of a relative which he/she knows, that relative takes the member to another relative’s house, and that another relative takes the member to another relative’s house which he knows and so on till the member is done with all the acquaintances whom he/she wants to invite. Same way it happens with friends too, the member of the hosting family visits the friend’s house which he/she knows, and that friend will take the member to another friend’s house which that guy must be visiting on Saturday’s and that another friend takes you to another friend's house and so on.

Embedded indexing

Now you might be feeling, why I started reading this I should not have in the first place opened this link itself when I was feeling sleepy. I should have opened some other story link which can act as a sedative and take me into sleep.

Here the main point we want to highlight is, if we can identify the links to reach a location then there is no need to move in a contiguous order, instead we can use the links to reach multiple specific locations of interest by using these links. In case of voucher records, intentionally such links based on their date of creation or alteration are formed so that they can be used in that order when required. These links are maintained using a self-balancing binary tree and point to be noted these links are maintained this way on the disk not in cache. This way the number of reads done from the disk are also optimized to the point of what is specifically needed by the business user.

This kind of maintaining the links is called embedded indexing and this has been customized for usage of the specific requirements of data patterns in Tally. This customized indexing technique helps tally to do minimal input output operations from the disk and in turn gives the optimum speed that can be achieved in use cases like the Day Book of Tally.

If you have more questions than what you had before you started to read this, that means you have understood what you have read.

We will come back with yet another insightful nugget of another ‘nut and bolt’/data structure, which runs in your Tally software to give you the most reliable and speedy experience in your favorite business reports.

Till then wish you lots of Cheers, happiness in learning something.