How to check which Ohai plugin hangs in chef client run

  sonic0002        2017-12-24 22:09:59       5,111        0    

Ohai plugins are very important components in chef client which aims to provide resource management automation on a server. The data discovered by Ohai plugins are describing the current state of the server and they will be used to maintain the server in a desired state. Each Ohai plugin discovers a specific pierce of information about the server such as cpu, memory, middleware etc.

However, there are rare occasions(well I am a bit optimistic here) someone may find that the chef client run is hanging and it hangs at Ohai plugin discovery stage. Given every server has its own unique configuration and state at the moment, sometimes it's very difficult to locate the issue quickly without playing around on the server for some time. When the Ohai plugin hangs, have you ever scratched your head to find out which exact Ohai plugin should be blamed? In this post, we would provide a feasible way on how to easily find out the problematic plugin. 

Since Chef is written using Ruby, so we can use ruby to invoke chef-client and start a chef client run. The command is:

C:\opscode\chef\embedded\bin\ruby C:\opscode\chef\embedded\bin\chef-client -c C:\opscode\chef\chef\client.rb --node-name [some-node-name]

This is similar to run the chef-client executable directly with below command except the main process above is the ruby and chef-client is a script it's running.

C:\opscode\chef\embedded\bin\chef-client -c C:\opscode\chef\chef\client.rb --node-name [some-node-name]

The fantastic part to run the chef-client with ruby is it can use some options which can help debug ruby level code execution, since Ohai plugins are written using Ruby, these options will be very helpful for tracing the Ohai plugin code execution. 

Ruby provides something called Tracer which outputs a source level execution trace of a Ruby program. This means it can output what code is being executed along the program execution. This will be extremely helpful for finding the Ohai plugin hang issue. To enable the tracing, just need to add the -r tracer option in ruby command. A precaution is this option will generate quite a lot logs information and ensure to remove it once your debugging is completed.

Hence if changing above command to below, lots of more tracing information will be outputted.

C:\opscode\chef\embedded\bin\ruby -r tracer C:\opscode\chef\embedded\bin\chef-client -c C:\opscode\chef\chef\client.rb --node-name [some-node-name]

You may also add the traditional -l debug --force-logger options to provide some additional debug information.

Next, we will demonstrate a real example of an experience of finding a hanging Ohai plugin on Windows. Initially, the chef client run is hanging every time after discovering some Ohai data and since Chef doesn't provide a clear indication of which plugin is running when the chef client hangs and it's a bit difficult to locate the plugin based on the logs(The log is different on different servers), we have to figure out an alternative method. The method we are using is to add the -r tracer option.

Let's take a look at the output when the chef client run hangs after we added this option.

#0:C:/opscode/chef/embedded/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:54:Kernel:-:       return gem_original_require(path)
#0:C:/opscode/chef/embedded/lib/ruby/site_ruby/2.0.0/rubygems/core_ext/kernel_require.rb:54:Kernel::     def initialize(namespace = nil)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:26:WmiLite::Wmi:-:       @namespace = namespace.nil? ? 'root/cimv2' : namespace
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:27:WmiLite::Wmi:-:       @connection = nil
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:28:WmiLite::Wmi::     def first_of(wmi_class)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:39:WmiLite::Wmi:-:       query_result = start_query("select * from #{wmi_class}", wmi_class)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:62:WmiLite::Wmi:>:     def start_query(wql_query, diagnostic_class_name = nil)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:63:WmiLite::Wmi:-:       result = nil
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:64:WmiLite::Wmi:-:       connect_to_namespace
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:81:WmiLite::Wmi:>:     def connect_to_namespace
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:82:WmiLite::Wmi:-:       if @connection.nil?
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:83:WmiLite::Wmi:-:         namespace = @namespace.nil? ? 'root/cimv2' : @namespace
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:83:WmiLite::Wmi:-:         namespace = @namespace.nil? ? 'root/cimv2' : @namespace
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:84:WmiLite::Wmi:-:         locator = WIN32OLE.new("WbemScripting.SWbemLocator")
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:85:WmiLite::Wmi:-:         begin
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:86:WmiLite::Wmi:-:           @connection = locator.ConnectServer('.', namespace)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:91:WmiLite::Wmi::     def raise_if_failed(result)
#0:C:/opscode/chef/embedded/lib/ruby/gems/2.0.0/gems/wmi-lite-1.0.0/lib/wmi-lite/wmi.rb:78:WmiLite::Wmi:-:       result.count

Have you noticed which plugin it is executing when the plugin hangs? Yes, it's the Uptime plugin which is to collect the server uptime information. And the statement it hangs at is the PowerShell query:

 uptime_seconds wmi.first_of('Win32_PerfFormattedData_PerfOS_System')['systemuptime'].to_i

After some research, the actual PowerShell script being executed is :

select * from Win32_PerfFormattedData_PerfOS_System

When manually executing this PowerShell script on the server, bingo, it hangs. We successfully located the Ohai plugin which causes the problem and the fix is pretty easy. Actually the official Ohai project has fixed this issue in its latest code. However, we didn't update to use the latest code and hence we were caught in the awkward shape.

Well, debugging is a tedious and boring job and it would take much effort if we head into the wrong direction or don't find an appropriate method to locate the issue. But once you find it, you will feel relief right away and blame yourself why the hell I cannot find it earlier. No worries, this takes time. 

CHEF-CLIENT  OHAI  UPTIME  HANG  HOW TO 

       

  RELATED


  0 COMMENT


No comment for this article.



  RANDOM FUN

Running unfamiliar software