Operations and Management Area
Configuration Management BOF (cfgmgmt) Meeting Minutes
46th IETF Washington, DC, USA
November 11, 1999, 3 PM

Chairs:  Randy Bush, Bert Wijnen
Reported by: Steve Moulton

[Contrary to IETF practice, these notes are rather longer and more detailed than ordinarily called for.  There was a great amount of useful information educed at this meeting, and a good effort was made to report all of that information here.]

The purpose of this BOF is to collect community thinking about configuration management requirement, and which protocols to design and/or expand and standardize in order to meet these requirements.  This input is needed for Area Directors and the IESG to guide them in deciding which work to allow and promote within the IETF.  This decision to decide exactly what path we will take will not be made in this meeting.

The Operations and Management Area Directors asked several different constituencies to get together and try to build a set of requirements.  

A brief backgrounder was provided to establish a common set of concepts (gold service, COPS for signaling/outsourcing, COPS for provisioning).

Two questions underlie this BOF: Why do we need two protocols, COPS & SNMP, to do management?  How do we do configuration management?  Documents have been done to compare COPS/PIB and SNMP/MIB, but due to shortness of time, the evaluations done are not as complete as they might have been.

These documents are:
-	draft-kzm-policy-protcomp-00.txt        
-	draft-schoenw-policy-snmp-00.txt   

These documents were synthesized into
-	draft-ops-mumble-conf-management-00.txt     

[all are available from ftp://www.ietf.org/internet-drafts]

Agenda:

Introduction (Bert Wijnen)

30 minutes - Configuration Management Requirements (Luis Sanchez)
15 minutes - Summarize how COPS/PIB and SNMP/MIB meet those requirements
   Keith McCloghrie, Jon Saperia
30 minutes -  Other requirements
   Qospifmib document (Yasusi Kanada)
   Recommended Next Steps (Jeff Case)
20 Minutes - General Discussion
10 Minutes - Wrapup

The first presentation was Requirements for Configuration Management.  (Please see the slides for the text of the presentation; it was not possible to capture all of the material here).  At the end of the presentation, comments from the attendees were solicited.

The presentation mentioned the issue of expiration time, but not of effective time.  The two need to be differentiated.  Also, it is important to get entire system to a known state, not just each element.  This can be difficult due to clock skew and network delays.   How do you do a rollback if you are only partially successful in establishing system state?

Bert Wijnen: The big issue is "synchronization across multiple systems".

There is a serious scalability problem when you go from the network to the device in your scalability requirements.  Since all of the intelligence of a policy needs to be explained to each device, you have a serious scalability issue, particularly in light of constant change.

Bert Wijnen:  Scalability is in everything we do.  The issue at hand is that we need to go to higher level of abstraction.  

I agree, but this needs to be in the requirements. Maybe we need to look at second level in the requirements so we don't go to drill-down right away.

The ideas are in there (in the documents), but implicitly.  If the device is there the translation from policy to element takes place.  This is the scalability solution.

We need to lay out clearly defined autodiscovery and discovery and topology policies.  We need to have scalability in protocol design at the beginning.  In particular we need to avoid polling.  The information needs to get out of the network and into the place where the network is managed.  The devices themselves need to provide this information.  

There is a problem when someone comes in with CLI (command line interface changes)  and gets devices out of sync with policy.  Point 9 [of the presentation] eliminates possibility of shared network management.  

There are two aspects of configuration you can think of (differentiates configuration and provisioning).  Configuration is relatively stable and doesn't respond to events in the network.  Feedback and responsive-based configuration respond to living network activities.  It is hard to imagine doing both within a single set of requirements.  It is hard to imagine using the same framework for things that change across months and things that change across seconds.

There is a further requirement - the ability to respond dynamically and quickly to events.  There are dynamic events happening, and the amount of delay between an event and the action taken on that event should be bounded.

What happens if a device goes out of whack (device reboot)?  Another requirement:  you need to not only prevent misconfiguration, but be able to figure out when a device fails and needs reloading.

You also need to differentiate between failure and instantaneous failover to redundant units.

Finally, you need a lightweight type of transaction (full rollback might be too heavy).  This transaction need not be bulletproof, but should cover most of the criteria.

Last, you need to understand that configuration comes from the device first.  You try to abstract going up, not going down.  The scaling has to be in terms of the devices, and try to build up from there.

Luis Sanchez:  Please send a list of requirements that are not on current requirement list.

There is a requirement for real time data configuration with real time confirmation.  

There is a requirement to get accounting feedback.

Bert Wijnen: Not sure what accounting is, please clarify.

The problem is that current MIBs are device oriented not flow oriented.  We need to be able to address flows to do accounting.

Thanks to the design team - I agree with the requirements as they stand.  There are significant requirements missing.  There is no specific requirement for performance.  Yes, we have left that to vendors, but we need to make specific performance requirements.

The scope of requirements is somewhat limiting.  It is becoming more difficult to provide business case justifications that are different from other aspects of the total network solution.   Need to be able to manage network devices, servers, and applications with the same set of tools.  

We are missing a preprovisioning requirement.  We need to be able to preconfigure a device for components that are not yet installed on the device.

Would preprovisioning include setting up a video conference before it begins?

Yes.  There is not a time component to it.

We do need to be able to be responsive to event at or near real time.  This includes traffic engineering.  This needs to be able to happen at times of network stress, particularly when network is at or near congestion collapse.

I did not notice mention of centralized backup mechanism.

Jon Saperia: This is implicit in (he referred to one the requirements from the presentation).

One thing not mentioned here is whether the management system is a point system or integrated system, and what kind of management interactions are expected.

Are you saying that you are not doing full scale management, but just configuration management?  

Keith McCloghrie:  There are two configuration levels: high level network configuration and low level device configuration.  We need to tie the two together.

Bert Wijnen: You need to get feed back at the high level. 

Yes, but also need to get a reply when I make a request.

Bert Wijnen:  When you take measurements you want to take them at the proper level to verify that the policy is being enforced.

We need a requirement that the configuring device be duplicated so we can load balance the PDP.

Jon Saperia: We certainly covered redundancy, though we did not specify the mechanism.

Is there anything in the requirements that addresses persistence across reboots?

Jon Saperia: No.

Further concerns were raised from the floor about the capability to do bulk configuration at reboot, use of DEN/LDAP (not in scope), problems when policy to device translations are ambiguous, and concerns about expirations times in the requirements and the difference between policy expiration due to timeout and due to loss of connection with the PDP.

Please send comments to the mailing list (at the end of this document).

Presentation:

Keith McCloghrie:  How COPS provisioning will meet requirements.
(See the slide for details).

The points of this presentation:

SNMP does not provide sufficient efficiency.  Even with techniques such as OID Compression, you still have too much overhead.  There is an example in the Internet Drafts where OID Compression can reduce a packet size from 200 to 100 bytes.  Using COPS reduces the size from 200 to 50 bytes.

COPS-PR has shared state: both sides know when you loose connection.  This gives rapid rollover from primary to secondary policy source.

COPS-PR guarantees exclusive access; you will not have the problem where two writes can lead to ill-formed configuration.  If you have to do CLI-based configuration, disable COPS first.

Indexing is simple: you have simple indexes so no lexicographical ordering.

SNMP SETS are hard. PIBs are like SMI, but leverage the more powerful mechanisms in the protocol.  

Integration: The role of an interface expresses high level policy.  All policies are applied to one role, then the roles are applied to the interfaces.   In device-local configuration, the devices are configured to have roles.  There is less configuration to be downloaded.  This could be done in MIBS, but PIBS would be more efficient.

The fear of the SNMP community is that this is not integrated.  The operator can get state of the network based on the roles.  This can be done at application level in the policy server.

Bottom line: It is better to have two optimized tools, than have one general purpose tool.  You'll never find a carpenter using a swiss army knife to screw screws and hammer nails.

Presentation: 

Jon Saperia: Internet Standard Management Framework - One Framework or Two.
(see slide and handout for details)

The points of this presentation:

COPS/PIBS depend on SNMP.  As networks get larger, we want less rather than more human intervention.  That is why we feel that one framework is better.

COPS isolates fault management and performance data from configuration data.  This is not good.

Reliance on TCP:  This is problematic.   It is important to distinguish between transport reliability and transactional integrity.  When the net is on the verge of network collapse, TCP reduces transport reliability. 

We have concerns about key distribution in both current proposals.

We need to support CLI for some time in the future.  CLI access needs to be cooperative with, not compete with, configuration management. 

One of the ways load sharing is accomplished is communication between devices that are sharing the load.  PDP-PDP communications methods are not well-specified.  

We find restriction to roles unnecessarily limiting, particularly as we may have to have per/interface roles (e.g., suppose that one wants to turn a specific interface off).

There is not much support for edge devices in multiple operational domains.  COPS can't do this.

We suggest that a better approach is continue development of internet standard management framework with additional requirements presented in this meeting.  What COPS/PIBS does can be done with SNMP off the shelf, but not as efficiently.  However, we can add to SNMP over time.

One cannot apply a policy to an interface unless we know fault status.  We have to feed that information immediately back to the management system.

COPS/PIB is highly optimized.   However, we've made a local optimization at the expense of the larger picture.  We think that is not a good tradeoff. 

In the end, seamlessness and one standard framework is best.

(At this point, there was a exuberant line of people at the microphone to make their points, however in fairness and deference to the schedule, commentary was deferred until later in the session.)

Presentation:  

Yasusi Kanada: SNMP-based QoS Programming Interface MIB for Routers.
draft-kanada-diffserv-qospifmib-00.txt

This presentation flowed directly from the slides (insert **slide** reference here).

Points:

The unit of operation in a MIB is too small.

All three approaches (SNMP, COPS, API) present large gaps between policy rules and device level.

Need either mib sequencer or rule-based language to fill the gap.  He prefers rule-based language.  

Presentation:

Jeff Case: Recommended Next Steps.
(see slides for details.  There was too much material for the notetaker to take down).

General Comments:

There is a big difference between provisioning an interface and provisioning a network.  We heard in the Jon Saperia presentation that he does not like concept of roles.  We heard concern about number of TCP connections, however http does use large number of connections all the time.  Finally, he wants a single protocol.  Are we going to replace the RAP and Diffserv working groups with changes to SNMP?

Point 1: Jon Saperia's presentation had no positive content - it was an attack on COPS.  It did not have a single point on how it will meet the requirements.  2:  There is a serious philosophical difference between static and dynamic management.  Policy-based management starts with assumption that network is not broken.  3:  When we say configuration do we mean configuration of SNMP-responsive devices or are we talking about all devices which can participate in QoS game, but are not built as SNMPv4 [sic] devices?  

In general, management is about populating rules in network elements. There are a number of protocols that can be used for dynamic provisioning, but for static provisioning, one can use SIPP, CORBA, HTTP, CLI.  This is a much bigger puzzle than COPS vs. SNMP.

Point 1: We need protocols that are implementable.  2: COPS does have a minimal security mechanism, but can use IPSEC.  3: SNMP does not have exclusive rights to ASN.1.  4:  Moving from N to P gives us an opportunity to put new conventions in from lessons we have learned.

I disagree with basic premise that two frameworks are not better than one.  There are multiple data stores out there, not just SNMP.  Each data store has different organization and different protocols.  The concept that we can ignore these data stores and have one magic framework is simply not real.  

This is a significantly larger problem than SNMP vs. COPS.  SNMP is capable of a lot of the things COPS can do, it's just a matter of how you go about it.

The infrastructure must be stable.  As an example, during the 80's airplanes were designed that that are not inherently stable.  If the control mechanism goes down, the plane becomes a rock.  I am struck by apparent need of active network management implied by COPS.  Configuration needs to be inherently stable.  

Another protocol added to the mix will not complicate life.  However, configuration management has been a resource problem for some time. I would buy anything that makes configuration management easier to do.  However, you are setting your sights too low.  We need a real object interface coming out of network devices.  We can now buy an object interface on a disk drive or smart card.  If we can't get it here, we will get it from some other industry standards body.

We are not ready to roll a new protocol out on the floor.  What occurred today is a very critical overview of the camps.  There is not enough explanation  here for which meets what requirements.  We do need something flexible enough do to both static and dynamic configuration.  

The last presentation laid out a very good plan on moving ahead with SNMP.  I am struck by lack of plan for moving ahead with COPS.  We've started w/ COPS, we are basically happy with it, we plan to move ahead.  Would like to see COPS work moved ahead.  Our organization is ready to move ahead.  

We all like to talk about protocols and protocol features.  We need to solve the data modeling and presentation problem, and need to move from data model to API work.

Closing: