Privacy of Business Data – A Case Study from Tally Solutions

|Updated on: September 18, 2023

Background

In August 2017 the Supreme Court of India ruled that the right to privacy is a fundamental right, protected by the constitution of India [1]. This led to the first draft of the Personal Data Protection Bill in 2018. After several consultations and debates with various stakeholders, in December 2019 the Personal Data Protection Bill 2019 (PDP Bill 2019) was tabled in the Indian Parliament by the MeitY [2]. A key clause in the Bill refers to the Privacy by Design policy [3]. This clause has triggered a flurry of discussions around what it means for India’s Tech Industry to follow Privacy by Design when dealing with personal data, and recently also when dealing with non-personal data [4, 5, 6].

This case study focuses on privacy of business data. We first explain why Privacy by Design is also very important when dealing with the data of business customers. Then we describe the thought process and the technical considerations that Tally Solutions Pvt Ltd has developed and followed for decades when dealing with the data of their millions of MSME customers. Lastly, we share a few tactical considerations that are useful when executing with Privacy by Design.

Distinction between private and public data, for a business

To ground our discussion on data privacy, let us first understand what is private data in the context of a business. The bulk of all information which is used/processed by businesses are implicitly public data - for example, the existence of a brand, a product, a company is public data since it is traded across multiple businesses. Here are some example criteria to identify data to be private.

  • A data point that violates anonymity is private; some obvious examples: PAN no, TIN no, Bank account no etc.
  • Business confidential information like financials, godown, inventory, bill of materials etc are private.
  • The actual transaction between any two business entities is private - including the fact that 'the two businesses transacted' and not just 'what, and how much’.
  • A master that does not participate in external transactions, participate only in internal transactions, can be potentially considered as private.

Identifying data to be private vs public is a very important consideration that should be taken conservatively - when in doubt about a specific data point, assume it to be private.

Why is Privacy by Design important if your customers are businesses?

For a business that is dealing with personal data, one facet of compliance with the data privacy laws is by ‘getting consent’ to access and/or store the customers’ data necessary to provide the services. However, the notion of ‘getting consent’ is not applicable in the same manner when your customer is a business. Hence, the onus to articulate and practice the concepts of data privacy for business customers mostly lies on the companies with such business customers.

Also, for a business, having full control over access of their data (e.g. financial data) is crucial to guard their competitiveness. So, a company serving business customers should have a transparent and well-defined process for handing the business customers’ data so that the business customers are always both aware of and in control of who can access their data.

Last but not the least, upfront focus on data privacy is a perceptible competitive advantage. Such attention to data privacy from day one can ensure that the company’s products are all built with the data privacy considerations put at the core from the ‘design’ stage itself instead of being bolted-on at a later stage. For example, such focus on privacy by design is arguably among the drivers that give Apple a competitive edge and help achieve premium positioning of Apple devices in the computing and smartphone market.

The Tally Way of Data Privacy by Design

Tally Solutions Pvt. Ltd., is an Indian multinational company that provides enterprise resource planning software. The software handles accounting, inventory management, tax management, payroll etc. and is used by nearly 2 million customers [7].

Tally Data Privacy

Tally serves a few millions of businesses today. Much of the architecture of the current product is premise-based, therefore the control of customer data is fully with the customer, including the remote services that it provides. Very shortly, Tally will provide customers anytime anywhere access to their data, with third party services being integrated with the application as indicated in the picture above.

For Tally, to support data exchange with such diverse stakeholders including millions of business customers, it is essential to ensure that the business customers have full and exclusive control over who can and cannot see their data. Tally uses a set of principles while designing a system to handle their customer’s data in a way that fully protects the privacy - whether the data is residing at customer’s premises, is traveling to Tally’s backend systems, traveling through Tally’s backend systems, or is stored in Tally backend systems.

1. Customer data will lie on their devices and on the Tally backend

A business customer may need to store the data on their premises, on other devices that they use, or on Tally backend. Here is how Tally ensures privacy of the data regardless of where it is stored:

    • The product has built-in access control mechanism, which can be used to control who can see and operate what part of the data. The level of control is completely at customer’s discretion, it can range from no control, to a level of control that the customer wants to set.
    • An encryption mechanism is available in the product, that the customer can use optionally. It is designed so that only the customer can read the data, and somebody without the actual password, even if he/she is the application developer, cannot read the data.

2. Customer data will move between their devices and the Tally backend

A business customer data may need to move among their various devices and also between their devices and the Tally backend. Here is how Tally ensures privacy of the data in transit:

    • As data travels between Tally Client and backend systems, it needs to be protected from man-in-the-middle attacks as it passes through the internet. Data in Transit is protected by modern and dynamic protocols to prevent any external ability to sniff the data or interfere with it.
    • Customer payload will be additionally encrypted for being able to be deciphered only at recipient end-point, and not through any of the backend systems it may need to pass through for routing purposes.

3. Customers will integrate with third parties

A business customer’s data may need to travel through the Tally backend to avail services from various third parties as shown in the diagram above. Here is how Tally ensures privacy of the data while it is passing through the Tally backend:

    • No data which can potentially be decrypted and/or deciphered in the backend is ever put on disk. It ensures that neither the software developer nor the operator can open the data in the backend.
    • Only the metadata which is used for routing and/or correlation of request-response is stored – and not any meaningful content of the customer data (which, in any case, can only be decrypted at the end-point of the recipient)
    • The software may need to log information for the purpose of compliance, or troubleshooting etc.
      • No logging of any customer related data that is not deemed as implicitly public.
      • Compliance related logging will happen in case it is sought by relevant authorities. In such cases, operational care will be put in place to ensure that only authorized people can access this log, and all such accesses are logged.

4. Customers will avail data-based services including analytics

For any business customer’s data that is used to enrich the analytics database, Tally follows a set of rules to ensure that such decipherable data received is never identifiable and is only in anonymized and aggregated form.

    • Data can never be pulled by the Tally backend; they can only be pushed by the Tally client running on the customer’s premises.
    • Any data that is private as per the definition above does not travel to Tally's backend system
    • Anonymization and aggregation to mask the identity of the source
      • The IP address of the source is not logged anywhere in the backend. Thus, it creates an irreversible path for flow of information, as far as tracing back to source is concerned.
      • Before any data is pushed to the Tally backend, the Tally client running on the customer’s premises would anonymizes and aggregates the data. Such anonymization can be accomplished, for example, with following steps:
        • (A) Break one data item into multiple smaller sub-items
        • (B) Apply strict and conservative rules on which sub-items are safe to share. A sub-item that includes private data is not shared. A sub-item that is very unique (e.g. a new product category that is being sold by only one vendor in a geographical area) is either not shared or shared after ‘aggregation’ to a coarser level such that it's no longer unique (e.g. “Caramel Espresso Rose Lassi" is aggregated as “Lassi” or “Bangalore City” is aggregated as “Karnataka State”)
        • (C) For the sub-items that are deemed safe to share, send them one at a time, to the Tally backend such that the Tally backend is oblivious to the fact that they are all coming from the same source. This way, no one can triangulate the source business even with full access to the data stored in the Tally backend, and sufficient time to examine a large number of combinations of various data pieces stored.

Understanding data about how the product is used is useful to provide deep insights, that results into improvement in product design. Tally uses anonymization techniques while gathering such data, so that customer identity cannot be reverse engineered in the backend, thus completely protecting customer’s privacy.

Success Story: Tally reports on web browser

Many of small business customers want to have physical control of their business data, and not worry about the possible risks of it going outside their office. Their data resides inside their premise. While customers can access their data through web browser from anywhere, the system is designed so that no data is stored in the Tally backend system. This allows the customer to remotely access the reports that they need to, without the privacy concerns of data being stored elsewhere.

Tactical notes on execution with Data Privacy by Design

The Data Privacy by Design (DPbD) principles outlined above should help clarify the technology and engineering related decisions and choices when building a product/service that deals with customers’ data. However, execution while adherence to DPbD also has important tactical considerations. The data-aware tech industry has only recently embraced the importance of DPbD, and hence naturally there are few documented precedents for the industry to learn from. To the extent that we have had some experience with the execution with DPbD, we share here the tactical pointers that may be useful for others. These pointers are neither objective nor comprehensive. However, we believe the industry as a whole would greatly benefit if more such pointers are shared by other players.

  • The process of engineering a software system that adheres to DPbD often challenges the broadly accepted ‘standard practices’. Generating limited or no logs, elimination of ‘root’ access post commissioning the system etc are some examples. These implications are usually not obvious or appreciated by all the stakeholders in the workforce from the get go. Deliberate and coordinated efforts in educating all the stakeholders about the DPbD principles (and hence the need to deviate from the seemingly ‘standard’ practices) greatly help in generating the necessary buy-in across the board.
  • While shortlisting the candidate analytics products that the organization would build, the DPbD principles should take a clear precedence over any other business considerations such as market demand, profit, growth potential etc. This is in contrast with the popular industry practice observed so far where the data privacy is only given a secondary attention. It helps to invest upfront in brainstorming sessions that bring the product management team, the data analytics team and the software engineering team on the same page about the target subset of analytics products which are within the DPbD boundaries while also promising enough to generate adequate business value.
  • Once the target subset of analytics products is defined, building them includes two subtasks that require designing and developing data-aware algorithms:
    • Building the gates and filters at various stages in the data pipeline so that privacy is fully protected and the data is fully anonymized before it reaches the analytics DB
    • Building the core analytics algorithms for the target analytics products.

The skillsets required by both of these subtasks largely overlap, and hence it’s a common practice that both are carried out by the same team. However, the pressure to unblock the development of analytics products (that promise lucrative business impact in the short term) could lead to inadvertent yet significant compromise of the DPbD principles. To prevent such compromise, it is recommended that the team responsible for algorithm development for the subtask A is kept at an arm’s length from the team responsible for the algorithm development for subtask B. In fact, when possible, the former team should include at least one outsider who 1) has diverse experience on building the data-aware algorithms across a variety of contexts, 2) has in past architected the gates and filters for data privacy and 3) has no incentive to make the ‘data availability’ easier to expedite the development of the envisioned analytics products.

References

Authored By

TallyPrime Blog banner

Your business & its growth is special for us! Get, set, grow with TallyPrime!