Running external Ruby code from Vagrant
The Story
Like a lot of Chef users, I’m using Vagrant for testing my cookbooks. I’m also using Berkshelf for providing the Vagrant box with the cookbooks it needs.
Until recently, I was happy using the ChefDK-provided Berlshelf (v4.0.1). I stopped being happy when running berks
started consuming CPU for ~5 mins and then failing when my Berksfile
contained multiple sources (the Chef Supermarket and my private Chef server).
While troubleshooting it I’ve learned that there’s an issue with the native dependency graph solver, and I won’t be able to fix it in less than a week.
I also noticed that the latest version of the Berkshelf gem (v4.1.1) had no such issues (unless I’m mistaken, it’s because it switched to the native Ruby graph solver).
The next logical step was migrating to the new version of Berkshelf
Attempting to upgrade Berkshelf in the ChefDK
I firstly tried working inside ChefDK by upgrading its version of ChefDK.
This made me learn several interesting things:
- The
/usr/bin/berks
file (actually/opt/chefdk/bin/berks
) loads specific versions of Gems.
This means that even if I install the new version of Berkshelf correctly, I’d have to modify this entry point, and it won’t be trivial. - The ChefDK Ruby environment is configured to install new Gems into the User’s home directory (using
GEM_HOME
).
I’m not sure why (something related with developing gems?) - The only way I could execute the new Berkshelf gem “properly” inside the ChefDK was using a
Gemfile
and something likechef exec bundle exec Berkshelf
, which was really annoying
Eventually I decided that the comfort of working inside the ChefDK isn’t worth the effort, as taking a clean Ruby 2 environment (e.g. using RVM or Bundler) and installing the Berkshelf Gem inside was effortless.
This worked well for non-Vagrant usage (e.g. calling it from Jenkins), but I still had quite a lot of work.
Running Ruby in Vagrant
My second issue was with running any Ruby code from inside Vagrant.
As any Vagrant-Berkshelf veteran knows, the workflow goes something like this:
- User runs some command requiring provisioning, like
vagrant up
- Vagrant calls the
vagrant-berkshelf
methods pretty early in the Vagrant workflow (afterVagrant::Action::Builtin::ConfigValidate
) vagrant-berkshelf
runsberks install
to locate all relevant cookbooks and generate theBersfile.lock
vagrant-berkshelf
callsberks vendor
to make a directory containing all cookbooks that the VM needs, which will be accessed by the Chef client on the VM And so forth
This workflow heavily depends on Vagrant executing Berkshelf, which works with ChefDK’s Berkshelf because its entry point is “environment-variable proof”:
#!/opt/chefdk/embedded/bin/ruby
#--APP_BUNDLER_BINSTUB_FORMAT_VERSION=1--
ENV["GEM_HOME"] = ENV["GEM_PATH"] = nil unless ENV["APPBUNDLER_ALLOW_RVM"] == "true"
#...
Compare this to the “normal” entry point generated by Gems:
#!/usr/bin/ruby2.0
#
# This file was generated by RubyGems.
#
# The application 'berkshelf' is installed as part of a gem, and
# this file is here to facilitate running it.
#
require 'rubygems'
version = ">= 0"
if ARGV.first
str = ARGV.first
str = str.dup.force_encoding("BINARY") if str.respond_to? :force_encoding
if str =~ /\A_(.*)_\z/
version = $1
ARGV.shift
end
end
gem 'berkshelf', version
load Gem.bin_path('berkshelf', 'berks', version)
The environment negation (deleting GEM_HOME
and GEM_PATH
) is (IMO) related to the Vagrant use-case.
Fact is, Vagrant is polluting the environment of subprocesses with Vagrant-specific Ruby-related variables.
Vagrant, Bundler and external processes
Vagrant uses Bundler as a way of managing its Ruby dependencies (both internal and plugins), so Vagrant suffers from the same issue that Bundler has - it assumes that subprocesses are supposed to run inside its own Ruby environment. To do so, it modifies its own ruby-related environment variables, such as GEM_PATH
(where to look for gems) and GEM_HOME
(where gems should be installed).
For cases where it’s not true, Bundler offers a method called Bundler.with_clean_env
. This should yield (execute a given code block) with the “original” environment (the one bundler had when it started), so any processes spawned from that block should be free of the Bundler contamination.
Vagrant tries to utilize this method, but it doesn’t work as expected.
with_clean_env
internals
Let’s drill down a bit:
# https://github.com/bundler/bundler/blob/5131fcd/lib/bundler.rb#L211
def with_clean_env
with_original_env do
ENV['MANPATH'] = ENV['BUNDLE_ORIG_MANPATH']
ENV.delete_if { |k,_| k[0,7] == 'BUNDLE_' }
if ENV.has_key? 'RUBYOPT'
ENV['RUBYOPT'] = ENV['RUBYOPT'].sub '-rbundler/setup', ''
ENV['RUBYOPT'] = ENV['RUBYOPT'].sub "-I#{File.expand_path('..', __FILE__)}", ''
end
yield
end
end
# https://github.com/bundler/bundler/blob/5131fcd/lib/bundler.rb#L203
def with_original_env
bundled_env = ENV.to_hash
ENV.replace(ORIGINAL_ENV)
yield
ensure
ENV.replace(bundled_env.to_hash)
end
# https://github.com/bundler/bundler/blob/5131fcd/lib/bundler.rb#L16
module Bundler
ORIGINAL_ENV = environment_preserver.restore
ENV.replace(environment_preserver.backup)
#...
So, when the Bundler module is loaded, it creates a backup of the current environment variables. This backup (plus some modifications) is used whenever with_clean_env
is called. How can it break?
By adding debug prints inside the Bundler gem, I deduced the following facts:
-
Bundler is invoked twice
First, the entry point ispre-rubygems.rb
, as evident from the vagrant launcher::::go // Line 187 cmd.Args[0] = "ruby" cmd.Args[1] = filepath.Join(gemPath, "lib", "vagrant", "pre-rubygems.rb") //... if err := cmd.Start(); err != nil { // ...
Note these bits at lib/vagrant/pre-rubygems.rb:
:::ruby # Line 19 require_relative "bundler"
:::ruby # Line 30 if ENV["VAGRANT_EXECUTABLE"] Kernel.exec("ruby", ENV["VAGRANT_EXECUTABLE"], *ARGV) else Kernel.exec("vagrant", *ARGV) end
And finally, this in
bin/vagrant
::::ruby # Line 69 require "bundler"
As you can see, the
pre-rubygems.rb
file is invoked first, loads Bundler, and then execs the Vagrant entry point, which loads its own Bundler. So the Bundler gem is loaded twice, and the second instance “saves” the environment already modified by the first instace, meaningwith_clean_env
is useless. -
Vagrant works around this
The Vagrant devs tried to solve this issue by backing up the environment variables before any modification, like so::::go // https://github.com/mitchellh/vagrant-installers/blob/c5eb9bb/substrate/launcher/main.go // Line 18 const envPrefix = "VAGRANT_OLD_ENV"
:::go // https://github.com/mitchellh/vagrant-installers/blob/c5eb9bb/substrate/launcher/main.go // Line 150 for _, value := range os.Environ() { idx := strings.IndexRune(value, '=') key := fmt.Sprintf("%s_%s", envPrefix, value[:idx]) newEnv[key] = value[idx+1:] }
And then allow restoring from it:
:::ruby # https://github.com/mitchellh/vagrant/blob/27157b5/lib/vagrant.rb # Line 236 def self.original_env {}.tap do |h| ENV.each do |k,v| if k.start_with?("VAGRANT_OLD_ENV") key = k.sub(/^VAGRANT_OLD_ENV_/, "") h[key] = v end end end end end
This method works (sort of).
with_original_env
is done wrong
Both the Bundler backup environment and the Vagrant backup environment are being handled in Vagrant::Util::Env.with_original_env
:
def self.with_original_env
original_env = ENV.to_hash
ENV.replace(::Bundler::ORIGINAL_ENV) if defined?(::Bundler::ORIGINAL_ENV)
ENV.update(Vagrant.original_env)
yield
ensure
ENV.replace(original_env.to_hash)
end
Now, notice the two issues here:
- In the normal Vagrant flow (working via the Vagrant launcher), the
Bundler::ORIGINAL_ENV
hash is useless because of the double invocation of Bundler. -
Because we’re only using
update
with the “proper” environment backup, values won’t be deleted, only replaced::::ruby good={'a'=>1} bad={'a'=>2,'b'=>3} bad.update(good) bad # => {"a"=>1, "b"=>3}
So values that didn’t exist in the backup and do exist in the current environment (e.g.
GEM_PATH
) will stay.
The Solution
This is the relevant PR
Firsty, I modified Vagrant::Util::Env.with_original_env
.
I made the assumption that if we’re going through the Vagrant launcher, we only need to restore its environment.
If not, we’ll restore the Bundler environment, if one exists.
The result looks like this:
proxy_env = Vagrant.original_env
if Vagrant.original_env.any?
ENV.replace(proxy_env)
elsif defined?(::Bundler::ORIGINAL_ENV)
ENV.replace(::Bundler::ORIGINAL_ENV)
end
After that, I had to locate the code in charge of spawning new processes and make sure that it’s using the right logic.
The interesting method is Vagrant::Util::Subprocess#execute
in lib/vagrant/util/subprocess.rb.
It’s very long, but you can save yourself reading it by believing me that the only thing it does about saving the subprocess from the Bundler modifications is calling jailbreak
, which is defined in the same file.
The introduction for this method is best quoted from the file direct:
This is, quite possibly, the saddest function in all of Vagrant.
The method itself does plenty with the environment, mainly dealing with environment-related corner cases. Our interesting part is this:
env.replace(::Bundler::ORIGINAL_ENV) if defined?(::Bundler::ORIGINAL_ENV)
env.merge!(Vagrant.original_env)
Instead of repeating the logic from with_original_env
, I removed it from jailbreak
, and instead took process.start
from execute
and wrapped it in with_original_env
, like so:
Vagrant::Util::Env.with_original_env do
process.start
end
I might have misunderstood jailbreak a bit, but hopefully it’ll work OK.
And there you have it.