Archive - Enterprise Monitoring RSS Feed

Cloud Monitoring Software: Automation and Intelligence Not Optional

The quest to improve the productivity and efficiency of IT organizations is an ongoing one. A number of technologies and processes have been adopted over the decades to make IT operations leaner and more effective. With the arrival and rapid adoption of Virtualization technology and Cloud infrastructures in the past few years, IT organizations worldwide are starting to realize significant economy-of-scale benefits. Reduction in costs for ‘incremental units’ of computing power, the ability to more easily flex up and down as needed, and the lack of restrictions imposed by the traditional models, will all drive a dramatic increase in the consumption of computing and application resources as organizations will be freed up to do more. On the flip side, steps will need to be taken to deal with the resulting increase in the administration burden, else the efficiency gains realized from shared, flexible IT infrastructure will be outstripped by the high cost of managing a more dynamic and complex environment.

Terms like “virtualization sprawl” have been coined to refer to the increase in the number of discrete virtual servers and related application components within the overall IT environment. This is no longer a hypothetical scenario, and organizations are already experiencing administration challenges because of the fundamental IT transformation driven by virtualization and cloud technologies. Consider the case of a leading educational institution in the Northeastern United States. Prior to embarking on an aggressive virtualization initiative, the operations team was responsible for ensuring the performance of approximately 1000 distinct physical servers. By the time the first phase of the server consolidation and virtualization initiative was completed, the team was tracking and managing the performance of over 7000 virtual servers!

As the number of discrete virtual servers, components and resident applications explodes, the performance monitoring and root-cause-analysis demands on IT administrators will multiply exponentially. Manually intensive legacy and point monitoring tools will not be able to keep up, and organizations will face significant challenges in detecting and resolving issues in a timely manner. In one recent case of an organization being overwhelmed, the IT team resorted to forced daily ‘proactive reboots’ of a large number of their servers. The team claimed that this workaround was the only way to keep the infrastructure performing, given the absence of a comprehensive monitoring and management solution to identify real issues and isolate problem sources. The IT team acknowledged that the organization’s users and business operations were being impacted by this daily reset cycle, but viewed this approach as the lesser evil compared to blind, reactive fire-fighting!

Off course, the better approach would be to take a more strategic stance and implement the right systems/processes to assure the performance of their IT infrastructure. Today’s cloud monitoring software solutions have to be capable of supporting automation of many of the routine administration tasks. More importantly, these systems need to have in-built intelligence to infer what his going on in the IT infrastructure and automate decision-making. The increased demands on the IT team will be partially offset by the automation capabilities of the monitoring solution, allowing IT personnel to focus on the deeper and more complex administration tasks. Furthermore, the overall efficiency and utilization of IT resources will be higher with the right capabilities in the IT monitoring software (see http://tiny.cc/cwytn to learn how).

Network Monitoring: Combine ‘Bottom-up’ Analysis with ‘Top-Down’ User Experience Measurements

A recent survey conducted by Network World revealed that most IT managers were unable to measure end-user experience with their traditional network monitoring software tools.  Over 50 percent of the survey respondents identified page response time, server query response time and TCP transaction response time (key measures of end-user experience) as being important, yet were not able to measure these metrics with their existing management tools. The survey highlighted a need for IT and network management software that is able to monitor the performance of IT from a user perspective (e.g. end-user page response time), as well as monitor the performance of the various underlying network, server and application components that make up the layers of infrastructure that enable delivery of services.

Although there are specialist solutions that support end-user experience monitoring, these tools are generally not pre-integrated with management tools that monitor the health of the underlying IT infrastructure. Having the linked ‘top-down’ and ‘bottom-up’ views and integrated capability within one IT monitoring system allows tracking service performance and user experience metrics, and then if problems are detected, the solutions facilitate drilling down to view and analyze the technical performance metrics for the various enabling components (e.g. CPU utilization of the application server).  This capability allows rapid and context-specific identification of potential causes of degradation of end-user experience. Having unified, correlated, status views allows the IT team to not only better assure the real-time user experience, but also conduct detailed analysis on areas of performance issues and bottlenecks in the underlying IT infrastructure.

Organizations that are in the midst of exploring new network monitoring software solutions can look for the following types of capabilities to get an integrated view of performance. Does the solution monitor metrics, such as response time, for complete multi-step end-user transactions? Ideally, any number of multi-step test transactions should be definable, where these tests can be monitored alongside the other device or server specific tests to generate alarms when thresholds are violated.  As part of scripting a transaction step, the user should be able to select specific frames and links for navigating through a particular path for testing purposes. Additionally, secure pages should be accessible by providing the relevant authentication credentials. As part of the scripting process, when the user clicks through to the next step, the software needs to be capable of performing basic validation to ensure that the transaction being scripted can indeed be executed without application access errors.

Combining transaction monitoring and infrastructure monitoring in one system, and then taking this one step further by mapping services to the relevant top-down and bottom-up metrics (see example of service monitoring solution at http://tiny.cc/mpqxn ), allows organizations to monitor service performance from both technical and end-user perspectives. As the overall IT infrastructure becomes more dynamic and complex with adoption of new technologies such as virtualization and cloud, the ability to unify and tie infrastructure monitoring with end-user experience monitoring will allow organizations to better assure overall business performance and customer satisfaction.

Business Service Management and Virtualization Monitoring Technology

Given the rapid adoption of shared and virtualized infrastructure in the data center environment, a new approach is needed to ensure the effective performance of the IT infrastructure. Rather than monitoring individual nodes and components in the data center in a piecemeal manner, organizations need to monitor the performance of supported services instead – by correlating all the underlying components of the service. The monitoring approach for applications and services has to account for inter-dependencies and impacts of the shared and virtual infrastructure, and has to account for all the dimensions that can impact a service. Traditional tools that display performance indicators in isolation are no longer adequate in meeting the needs of today’s complex data center.

A monitoring approach centered around BSM (Business Service Management) starts by first looking at the performance and availability of the business services, and then the underlying technology components within the data center. A mapping is created between business services and the underlying infrastructure through the use of Business Service Containers. These are flexible, automated objects which represent business services in an organization. They allow an organization to create logical, business-oriented views of the overall physical and virtualized infrastructure in the data center. Users can define different SLAs for different containers, create fault-tolerant redundant models within a container, and have nested containers with cascading alarms.

Through using a BSM solution like Zyrion’s Traverse software system, data center and IT managers are able to have access to real-time or near-real time information on the availability and performance of business services. The system identifies the affected business services when problems occur in the complex, distributed and virtualized data center environment. Once alerted of a service-impacting problem, users are able to drill down from a BSM dashboard to a device-level view, and then all the way down to the packet flow to isolate the root cause impacting a given business service.

The cost and agility benefits of virtualized and shared infrastructure environments are evident. Some degree of private and public cloud infrastructure will also be part of the mix in a modern data center environment. If this is the situation you are either in or moving towards, BSM solutions will become a must-have capability. Senior managers tend to understand the value of service-oriented IT monitoring. In a Zyrion customer survey, over 80% of our customers use the BSM features in our product, and in almost all cases, senior managers were using the BSM technology and dashboards on a regular basis. So, this is something that is being driven down from the top and is starting to become part of the corporate culture. Additionally, solutions such as Zyrion Traverse can be evaluated at no cost. A fully-functional, 30-day, trial version can be downloaded at http://www.zyrion.com/download/. Although the system has a myriad of advanced capabilities, the solution can be deployed and made operational within days with minimal support from Zyrion.

Network Monitoring Software: Architecture Considerations

The enterprise IT environment is continuing to experience significant changes. An organization’s network monitoring software solution has to be capable of supporting future requirements, whether it is growth in the volume of monitored components, new custom applications/devices that need to be monitored, or different use models. If you are in the midst of considering an upgrade from your open-source or point monitoring tools, or replacing an inflexible legacy solution, make sure whatever solution you are evaluating is scalable, open and extensible to ensure that it is future-proof.

A key limitation of traditional network management systems is the existence of a centralized database for processing of performance data. Even if the collection of data is managed by distributed components, the solutions invariably require centralization of the data for processing and alert generation. For large infrastructures, this introduces a significant performance bottleneck. The multiplier effect of the amount of data that needs to be processed as new devices are added is enormous.

Capturing and processing these metrics in a single centralized database will put immense pressure on the overall application, creating a significant bottleneck. A key consideration in a replacement solution is whether it is based on a distributed architecture that does not have centralized database bottlenecks. For example, some solutions will have both distributed collection capability and a distributed database architecture. In these solutions, individual data gathering components will often have small local databases that are able to process tens of thousands of metrics every few minutes to generate alarms as needed, and also store the data locally for multiple years. Monitoring consoles receive notifications as they occur, and are able to retrieve performance data from these separate databases when needed for analysis and reporting. No sophisticated database scaling or specialized database administration expertise is required for these systems.

A next generation network performance monitoring software system also has to support different points of integration depending on the stage of the service management lifecycle, whether it be configuration of devices and tests, establishing user privileges, capturing performance data from custom applications/systems, initiating actions/notifications in external ticketing systems, or displaying performance data on external portals. In many modern data center environments, the monitoring software has to be capable of accepting performance data feeds from custom applications. This could also include processing syslogs and event logs generated by applications. Certain events generated by the network monitoring system may require initiating an action or process in some external system (e.g. ticketing).

All of these requirements need to be supported via flexible, open APIs and plug-in frameworks within the monitoring system. Make sure your replacement solution exposes a rich set of two-way APIs and open extensibility for integrating with existing systems or technology. The API and external feeds need to provide interface points to either import or export data throughout the IT environment. Ensure that the API supports standard technology, such as Web Services, Java, Perl and C, and allows provisioning and updating users, devices and tests (see solution example).

In summary, comprehensive monitoring functionality in and of itself is not sufficient. You need to make sure that whatever solution you select adheres to the basic architecture tenants of scalability, availability, openness, flexibility and extensibility (see complete list of key requirements). It is a dynamic IT environment around us. Make sure your network monitoring software system can keep up as you move forward.

Page 3 of 3«123
-->