Traffic Monitoring and Management for UCS Session ID- Steve McQuerry, CCIE # 6108, UCS Technical Marketing @smcquerry www.ciscolivevirtual.com
Agenda UCS Networking Overview Network Statistics in UCSM Understanding Collection Policies Hotspot Detection Engineering to Avoid Hotspots 3
UCS Networking Overview
System Components: High-level Overview FABRIC INTERCONNECT CHASSIS IO MODULE (FEX) 6120 & 6140 6248UP 2104XP 2204XP 2208XP Cisco (M81KR; VIC1280) 3 rd party INTERFACE CARDS 5
UCS Networking Overview SAN G Fabric Extender G M Adapter A G R x8 x86 Computer G S Fabric Interconnect G B I Compute Chassis x8 C MGMT S G Fabric Interconnect G X X X X X Compute Blade (Half slot) LAN C P Adapter I x8 B R A G x8 X x86 Computer Compute Blade (Full slot) G G P Adapter Fabric Extender SAN Top of Rack Controller (Fabric Interconnect) (10GE ports) + (1 or 2 Slots for expandability) Chassis Up to 8 half width blades or 4 full width blades Fabric Extender (FEX or I/O Module) Host to uplink traffic engineering Up to 80Gb Flexible bandwidth allocation (Gen 1) Mezzanine Card Adapter Virtualized adapter for single OS and hypervisor systems Dual connected Compute Blade 6
UCS Networking Overview 1 st Generation Hardware vpc SAN A SAN B Port-Channel Fabric A Port-Channel Fabric B FEX A FEX B Mezz Mezz Mezz n i c n i c h b a h b a n i c n i c h b a h b a n i c n i c h b a h b a 7
UCS 2104 (1st Gen FEX) Server to Fabric Pinning slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 1 link NIF Fabric Interconnect Server slots pinned to uplink Uplink: slots 1,2,3,4,5,6,7,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 2 links NIF Fabric Interconnect Uplink 1: slots 1,3,5,7 Uplink 2: slots 2,4,6,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 4 links NIF Fabric Interconnect Uplink 1: slots 1,5 Uplink 2: slots 2,6 Uplink 3: slots 3,7 Uplink 4: slots 4,8 8
UCS Networking Overview 1 st Generation Hardware Fabric A Fabric B IOM A IOM B Mezz n i c n i c h b a h b a 9
UCS Networking Overview 2 nd Generation Hardware vpc SAN A SAN B Port-Channel Fabric A Port-Channel Fabric B IOM A IOM B Mezz Mezz Mezz n i c n i c h b a h b a n i c n i c h b a h b a n i c n i c h b a h b a 10
UCS 2204 (2 nd Gen FEX) Sever to Fabric Pinning slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 1 link NIF Fabric Interconnect Server slots pinned to uplink Uplink: slots 1,2,3,4,5,6,7,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 2 links NIF Fabric Interconnect Uplink 1: slots 1,3,5,7 Uplink 2: slots 2,4,6,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 4 links NIF Fabric Interconnect Uplink 1: slots 1,5 Uplink 2: slots 2,6 Uplink 3: slots 3,7 Uplink 4: slots 4,8 11
UCS 2204 (2 nd Gen FEX) Sever to Fabric Pinning Server slots channeled across all uplinks slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 4 links NIF Fabric Interconnect Uplink 1: slots 1-8 Uplink 2: slots 1-8 Uplink 3: slots 1-8 Uplink 4: slots 1-8 12
UCS 2208 (2 nd Gen FEX) Server to Fabric Pinning slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 1 link NIF Fabric Interconnect Server slots pinned to uplink Uplink: slots 1,2,3,4,5,6,7,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 2 links NIF Fabric Interconnect Uplink 1: slots 1,3,5,7 Uplink 2: slots 2,4,6,8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X 4 links NIF Fabric Interconnect Uplink 1: slots 1,5 Uplink 2: slots 2,6 Uplink 3: slots 3,7 Uplink 4: slots 4,8 13
UCS 2208 (2 nd Gen FEX) Server to Fabric Pinning slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 slot 1 slot 2 slot 3 slot 4 slot 5 slot 6 slot 7 slot 8 F E X F E X 8 links NIF Fabric Interconnect Server slots pinned to uplink Uplink 1: slot 1 Uplink 2: slot 2 Uplink 3: slot 3 Uplink 4: slot 4 Uplink 5: slot 5 Uplink 6: slot 6 Uplink 7: slot 7 Uplink 8: slot 8 Server slots channeled across all uplinks 8 links Uplink 1: slots 1-8 NIF Fabric Interconnect Uplink 2: slots 1-8 Uplink 3: slots 1-8 Uplink 4: slots 1-8 Uplink 5: slots 1-8 Uplink 6: slots 1-8 Uplink 7: slots 1-8 Uplink 8: slots 1-8 14
UCS Networking Overview 2 nd Generation Fabric Interconnect Fabric A Fabric B IOM A IOM B Mezz n i c n i c h b a h b a 15
Network Statistics in UCSM
Network Statistics in UCSM Network statistics are collected by UCSM from the NX-OS software in the Fabric Interconnects. These are counters that are available for networking components Because of NIV technology the Fabric has visibility to the Cloud (LAN/SAN uplinks), the IOM and the server NIC. 17
Network Statistics in UCSM Access Statistics through: LAN or SAN tab (port-group) Devices tab ( server ports, network uplink, storage ports, and mezz ports,) Server Tab (vnic) 18
Network Statistics in UCSM FI to LAN Network Uplink (Cloud) vpc SAN A Port-Channel Fabric A Port-Channel Fabric B 19
Network Statistics in UCSM Port Channel is the Aggregate of all interfaces in the channel Statistic for the channel are the sum of the statistics of the members Individual member statistics are also visible in the system Network usage is measured against TX Total bytes and RX Total bytes 20
Network Statistics in UCSM FI to Uplink Port Channel Statistics 21
Network Statistics in UCSM FI to Uplink Individual Port Statistics 22
Network Statistics in UCSM FI to SAN Port Statistics SAN A SAN B Fabric A Fabric B 23
Network Statistics in UCSM FI to SAN Uplink (Cloud) Port Statistics A FC Port Channel is the Aggregate of all interfaces in the channel Statistic for the channel are the sum of the statistics of the members Individual member statistics are also visible in the system FC usage is measured against Bytes RX and Bytes TX 24
Network Statistics in UCSM FI to SAN Port Statistics 25
Network Statistics in UCSM FI to IOM (Internal LAN) Fabric A Fabric B IOM A IOM B 26
Network Statistics in UCSM FI to IOM (Internal LAN) 27
UCS Networking Overview Server to IOM Mezz n i c n i c h b a h b a 28
Network Statistics in UCSM Server to IOM vnic Port Statistics 29
Understanding Collection Policies
Understanding Collection Policies A collection Policy consist of a collection interval and a reporting interval The collection policy is set under the admin tab in Stats Management -> Collection Policies -> Collection Policy name A unique policy can be set for, Adapters, Chassis, FEX, Port, Server, and Host* Not all policies involve network components *Host is an unused policy in UCSM at present 31
Understanding Collection Policies 32
Understanding Collection Policies Collection Interval The collection interval is how often the system will query a device for statistics. The default collection interval is 60 seconds The more frequent the interval the more granular the data. We will use 30 seconds. The timing of the collection interval is important because it will be used in BW calculations for hotspot detection 33
Understanding Collection Policies Reporting Interval The reporting interval is internal to UCSM and determines how often UCSM will store data from the collection interval. This data is stored in tables and the last 5 reporting intervals are available for inspection in the system Reporting interval data is used to calculate minimum, maximum and average values shown in the statics view. 34
Understanding Collection Policies Select the Policy you want to change. Make selections and press the save changes button 35
Hotspot Detection
Hotspot Areas There are three potential hotspot Fabric Interconnects to LAN/SAN SAN A SAN B locations for UCS network connectivity. 1.) Send and Receive between Fabric Interconnect and LAN/SAN FEX to FI FEX to Host Port-Channel Fabric A FEX A Port-Channel Fabric B FEX B 2.) Send and Receive between FEX and Fabric Interconnect 3.) Send and Receive between Host and FEX n i c n i Mezz c h b a h b a n i c n i Mezz c h b a h b a n i c n i Mezz c h b a h b a 37
Hotspot Detection Threshold Policies To identify hot spots we will use Threshold policies in conjunction with collection policies to alert as we pass thresholds. A threshold is calculated by measuring a statistic against a policy. The policy measures change against a user defined normal value and turns on the alert between a users set high/low threshold and turns off the alert below the user set low threshold. High - Up High - Up High - Up Low - Up Low - Up Low - Up Normal Normal Normal 38
Hotspot Detection Calculating BW Threshold Limits for an Element The statistic we use to calculate bandwidth is the delta in bytes. This should be measured for both TX and RX This delta is calculated in bytes changed over a period of time defined by the collection interval, for example 30 seconds 39
Hotspot Detection Calculating BW Threshold Limits for an Element For Ethernet the BW of a single link is 10Gbps First we determine our desired threshold for example 8Gbps We need to calculate the expected change in bytes over the collection interval. To calculate divide the desired BW by 8 bits per byte and then multiply by the collection interval by the time to get the expected delta in bytes for our collection period. Example 8 Gbps over 30 seconds = 30,000,000,000 bytes 8Gbps / 8bits per byte = 1,000,000,000 bytes per second 1,000,000,000 bytes per second * 30 seconds = 30,000,000,000 bytes 40
Hotspot Detection Threshold Calculations Speed in Gbps Percentage of BW Conversion to Bytes Delta expected over 30 second collection interval 10 100% 1,250,000,000 37,500,000,000 9 90% 1,125,000,000 33,750,000,000 8.5 85% 1,062,500,000 31,875,000,000 8 80% 1,000,000,000 30,000,000,000 7.5 75% 937,500,000 28,125,000,000 7 70% 875,000,000 26,250,000,000 6.5 65% 812,500,000 24,375,000,000 6 60% 750,000,000 22,500,000,000 5 50% 625,000,000 18,750,000,000 4 40% 500,000,000 15,000,000,000 3 30% 375,000,000 11,250,000,000 41
Hotspot Detection Threshold Alerts 42
Hotspot Detection Threshold Policies Placement Threshold policies can be configured for the following: Internal LAN IOM to FI LAN Cloud FI to Upstream Ethernet switches SAN Cloud FI to Upstream SAN switches Server Between the server NIC and the IOM 43
Hotspot Detection Threshold Policies Internal LAN Navigate to admin->stats management and expand the fabric. Select thr-policy-default and create a threshold class Choose Ether Tx Stats from the stat class and click next. 44
Hotspot Detection Threshold Policies Internal LAN 45
Hotspot Detection Threshold Policies Internal LAN 46
Hotspot Detection Threshold Policies Click the add button and select Ether Tx Stats Total Bytes Delta as the property type enter 0.0 as the normal value Select the Alarm triggers you want to get and enter your values and click OK Click finish to be returned to the policy. Click the classes tab to see your policy. 47
Hotspot Detection Threshold Policies Internal LAN 48
Hotspot Detection Threshold Policies You will need to add another class for RX traffic. Click the + bottom to the right and repeat the steps from the previous policy choosing Eter Rx Stats as the stats class this time Click save changes once you have completed the steps 49
Hotspot Detection Threshold Policies 50
Hotspot Detection Threshold Policies For Uplinks you can repeat this process for the LAN cloud. For SAN use a single Stats class fcstats and create a definition for rx and tx stas under the same stats class 51
Hotspot Detection Threshold Policies 52
Hotspot Detection Threshold Policies For the vnic port you will need to create a threshold policy to be used with a service profile. Go the the appropriate organization level and select create threshold policy 53
Hotspot Detection Threshold Policies 54
Hotspot Detection Threshold Policies Give the policy an applicable name and description and press next Choose the vnic stats class and create a single threshold with the rx bytes delta and the tx bytes delta. 55
Hotspot Detection Threshold Policies 56
Hotspot Detection Threshold Policies Apply the threshold policy to the Service profile of the servers you want to monitor 57
Hotspot Detection Threshold Policies When a server reaches a threshold you will receive an alert on UCSM. This will exist while threshold is exceeded once it drops below the definition the alert will disappear Alerts show as system faults as defined by the threshold policy Currently UCSM does not send traps for this fault 58
Hotspot Detection Threshold Policies 59
Engineering to Avoid Hotspots
QoS Architecture SAN G G Compute Chassis Fabric Extender M Adapter A G R G x8 B x86 Computer G I S Fabric Switch x8 C MGMT X X X X X Compute Blade (Half slot) LAN S Fabric Switch G C x86 Computer I P Adapter x8 B G R A G x8 X Compute Blade (Full slot) G G P Adapter SAN Fabric Extender No packet drops within the array Largest buffers are on switch and host memory, so congestion pushed to edges Priority Flow Control (PFC) used to ensure packet drops are at vnic or Switch All traffic in a CA system belongs to 1 of 6 System Classes Four are user configurable while the other two are for FCoE and standard Ethernet QoS parameters can be configured at a per system class level, or a per vnic level. 61
System Buffering/Queuing 62
User Configuration Users configure QoS parameters at two levels Globally for each System Class COS value for packets in this class Drop/No-drop behavior Strict Priority Example Class Name FC Bronze COSValue 3 0 Drop/No-Drop No-Drop Drop Strict Priority No No Bandwidth/Weight 20% 30% Bandwidth/Weight 63
User Configuration Users configure QoS parameters at two levels Example: Logical Server A For each vnic (Egress properties) vnic1 vnic2 vnic3 Class FC FC Bronze Rate 4000 4000 5000 Burst 300 400 100 System Class for traffic from this vnic Rate limit (Mbps) Burst Size (Kbytes) 64
User Configuration Example Global System Class Definitions Class Name FC Gold Ethernet BE COS Value 3 1 0 Drop/No-Drop No-Drop Drop Drop Strict Priority No No No Bandwidth/Weight 1 (20%) 3 (60%) 1 (20%) FC Traffic High Priority Ethernet Best Effort Ethernet Logical Server A vnic1 vnic2 vnic3 Logical Server B vnic1 vnic2 Class FC FC Eth. BE Rate 4000 4000 5000 Burst 300 400 100 Class Gold Eth. BE Rate 600 4000 Burst 100 300 65
QoS Tools Transmit Queues One Two Three Four Five Six Seven Eight Priority Flow Control STOP Ethernet Link PAUSE Enables lossless Fabrics for each class of service PAUSE sent per virtual lane when buffers limit exceeded Receive Buffers One Two Three Four Five Six Seven Eight Eight Virtual Lanes 3G/s Offered Traffic 3G/s COS based Bandwidth Management 2G/s 3G/s 3G/s 3G/s 3G/s 4G/s 6G/s t1 t2 t3 10 GE Link Realized Traffic Utilization 3G/s 3G/s 3G/s HPC Traffic 3G/s Storage Traffic 3G/s LAN Traffic 4G/s t1 t2 t3 Enables Intelligent sharing of bandwidth between traffic classes control of bandwidth 802.1Qaz Enhanced Transmission 2G/s 3G/s 5G/s Among the tools used are aggregate shapers at the vnics (VIC Adapter), ETS, Policers at the switch for each vnic. 66
QoS Configuration in UCSM Enable QoS Classes in UCSM 67
QoS Configuration in UCSM Create a QoS Policy 68
Applying QoS to a Policy Apply Policy to Adapter 69
Traffic Engineering vnics can be pinned to a specific FI when created (with configurable failover to other switch) Depending on requirements, vnics could be pinned to one interconnect or distributed evenly vnic-1 vnic-2 FEX-1 2 Fabric Extenders in chassis, each with 1 link to the FI. vnic-3 FI-1 VIC with 3 vnics Blade-1, VIC-1 vnics in System Class C pinned to one interconnect Blade-2, VIC-1 vnic-1 vnic-2 2 FI, both with 1 connection to each FEX vnic-3 FEX-2 Class-A Class-B Class-C FI-2 70
Controlling Pinning in Profile From the each server a fabric interconnect can be chosen to balance the traffic 71
Traffic Engineering vnics can be pinned to a specific FI when created (with configurable failover to other switch) Depending on requirements, vnics could be pinned to one interconnect or distributed evenly vnic-1 FEX-1 vnic-2 vnic-3 FI-1 Blade-1, VIC-1 vnics in System Class C distributed across interconnects Blade-2, VIC-1 vnic-1 vnic-2 vnic-3 FEX-2 Class-A Class-B Class-C FI-2 72
Summary UCSM is designed for optimized traffic flow Stats Management and Threshold policies allow you to monitor traffic levels QoS and Traffic engineering tools allow you to manage potential bottlenecks in UCS 73
Complete Your Online Session Evaluation Give us your feedback and you could win fabulous prizes. Winners announced daily. Receive 20 Passport points for each session evaluation you complete. Complete your session evaluation online now (open a browser through our wireless network to access our portal) or visit one of the Internet stations throughout the Convention Center. Don t forget to activate your Cisco Live Virtual account for access to all session material, communities, and on-demand and live activities throughout the year. Activate your account at the Cisco booth in the World of Solutions or visit www.ciscolive.com. 74
Final Thoughts Get hands-on experience with the Walk-in Labs located in World of Solutions, booth 1042 Come see demos of many key solutions and products in the main Cisco booth 2924 Visit www.ciscolive365.com after the event for updated PDFs, ondemand session videos, networking, and more! Follow Cisco Live! using social media: Facebook: https://www.facebook.com/ciscoliveus Twitter: https://twitter.com/#!/ciscolive LinkedIn Group: http://linkd.in/ciscoli 75