Consistent file system view with Nextcloud and IBM Spectrum Scale

Post date

September 13, 2019

Categories
Author

Jos Poortvliet

Spectrum Scale logo

At large scale, file storage becomes progressively complicated. A collaboration between Nextcloud and IBM has made the leading large scale file storage and leading content collaboration platform a perfect match for storing, sharing and working with large volumes of data. Visitors of the Nextcloud Conference, September 14 and 15 in Berlin, will be able to learn more about this integration from some of the engineers behind the effort.

Nextcloud

Nextcloud is a flexible, on premise Files Sync and Share and Collaboration platform. Nextcloud was designed to make content easily accessible to all members in an organization, wherever the content resides and however the member needs to access it. It features an easy, consistent user interface with extensive collaboration capabilities on mobile, web and desktop and conforms to the highest security and data protection standards. Nextcloud is highly extensible with apps adding functionality and offers deep integration in infrastructure like user management and storage.

IBM Spectrum Scale

IBM Spectrum Scale is a high-performance file system for managing data with the distinctive ability to perform analytics in place with comprehensive support for data access protocols including POSIX, NFS, SMB, HDFS and S3/Object. It can provide a single namespace for all this data, offering a single point of management with an intuitive graphical user interface. IBM Spectrum Scale offers high scalability, high availability, automated data management and reliability with no single point of failure in large file storage infrastructure.

Nextcloud storage

A Nextcloud installation requires a primary storage and optionally can extend this with external storage. The primary storage is used to hold all the files and metadata of the users like home directories, versions, encryption keys, trash bins and more. Any object storage using the S3 or Swift APIs can be used as primary storage. But most users use some kind of POSIX compatible file system as primary storage. IBM Spectrum Scale is a popular choice due to its reliability and scalability.

Besides the primary storage used for various metadata like thumbnails, a Nextcloud installation typically integrates external storage. Through this external storage, Nextcloud can aggregate all the storage pools in an organization and make them accessible to the users via one familiar, easy to use interface across platforms and locations. External storage can be any storage that is accessible via SMB, NFS, (s)FTP, S3, SWIFT, WebDAV, Sharepoint or various other protocols.

Challenges

To function effectively, Nextcloud needs to be continuously aware of all changes in the external storage such as create, rename, write and delete operations. This is needed to keep the metadata in Nextcloud in sync, to manage file version, activity streams, user notifications, manage efficient syncing to offline clients and more. This is easy if Nextcloud has exclusive access to the storage solution, a requirement for its primary storage. However, with external storage, this is often not the case. Files can be modified by various business processes and tools or by the user through another interface such as SMB or NFS. Still users expect the latest version of each file that are created or modified outside Nextcloud to be available in Nextcloud for immediate access, sharing and syncing. A business application could make files available on a internal FTP drive, for example, or users could modify files through Sharepoint. When files are modified through means other than the Nextcloud interface, an update of the Nextcloud internal metadata is needed. Nextcloud has the ability to scan an external storage for changes, but this introduces delays and introduces scaling limitations. At a large scale, even solutions like inotify or SMB notifications are insufficient due to their technical limitations.

Spectrum Scale Clustered Watch

IBM Spectrum Scale 5.0.3 introduces the Clustered Watch feature to improve the monitoring of activities in a Spectrum Scale file system. By monitoring activities in the file system it is possible to automate responses to file access events. For example, a Spectrum Scale administrator can set up a Watch to log every file CLOSE event into a configurable log file. The log file can then be parsed periodically by an external application to trigger further processing of the file.

Spectrum Scale Clustered Watch is designed to emulate Linux inotify, but it has some significant advantages to simplify the response to events. IBM Spectrum Scale is a distributed file system that can be mounted on many cluster nodes. Spectrum Scale Clustered Watch gathers the Watch events from all nodes and makes them available at one consolidated place. Furthermore, in contrast to Linux inotify a Spectrum Scale Watch on a directory monitors the activities not only in that directory and but also in all its subdirectories.

The integration solution

IBM, Nextcloud and the University of Augsburg worked on an integration to improve the performance and scalability of IBM Spectrum Scale as external storage for Nextcloud. In late 2018 and early 2019 a proof of concept integration was developed. This proof of concept uses Spectrum Scale Clustered Watch to track all changes in the file system and notify Nextcloud. The result is that the file structure view in Nextcloud is within less than a second in sync with the state of the file system, even on very large external storage deployments. The integrated solution is designed to be very scalable and will work in a setup with a large number of Nextcloud application servers and large Spectrum Scale file systems.

Technical implementation

The integration solution can run on one or more Nextcloud application servers and is designed to use Redis. Redis is used in Nextcloud for caching and file locking handling. It is a well tested solution which scales with Nextcloud use, allowing for clustered deployments.

The integration tool receives Spectrum Scale Watch events every time a file is changed in the Spectrum Scale file system. The tool then uses a queue in the Redis database to notify Nextcloud of the change. A background service in Nextcloud consumes this queue, scanning the files and updating the Nextcloud index with the changes. This Nextcloud background service can run in parallel on several application servers and the load can be distributed over multiple Redis server to ensure high performance and full scalability.

Availability

This solution is available for mutual customers today but requires at least Spectrum Scale 5.0.3 and Nextcloud 17. Contact Nextcloud for a deployment or proof of concept.

At the Nextcloud Conference, September 14 and 15 2019 in Berlin, IBM Spectrum Scale specialist Ulf Troppens and Nextcloud file systems engineer Robin Appelman will discuss the integration.

Summary

IBM Spectrum Scale and Nextcloud provide a reliable, scalable and performant solution for highly secure data storage that is suitable for modern organizations and their needs for efficient team collaboration. The flexible design of both solutions enabled the development of a efficient integration technology, improving the scalability and responsiveness of the solution. All files are accessible directly via the Spectrum Scale file system or via Nextcloud without compromises in performance and user experience.

Please contact IBM or Nextcloud for more information.

Start the discussion at the
Nextcloud forums

Go to Forums