random stuff and then a bit of gentoo stuff

To content | To menu | To search

Thursday, November 10 2011

how to update multiple vm`s using a binpkg builder?

So i had some spare time this week, after i have migrated my infrastructure to a Virtual Machine host (VM) aka old deksktop hardware reused. One of the first recurring problem i faced was keeping the 5 gentoo guests up to date. I could log into to each guest (over ssh) and manually update each guest, and repeat those steps 5 times. That was getting pretty cumbersome fast.

I should mention that all the guest share have mounted a nfs share where all the binpgks are stored. So essentially every time some new package is emerged the guest will store a generate bin on the nfs share. Of course that requires that the guests haves the same configuration (useflags, chost, virtual cpu etc).

Combine that with i have a "internal" vm guest i use for internal testing purposes (and hence it is hooked up to my desktop over distcc) and because it is meant for various testing purposes there is allocated a lot of resources to it compared to a guest simply running public services such as apache(this webpage).

So the situation is that i have a fastVM that could generate binpkgs faster than each of the other vm`s by using distcc. But how do you generate binpkgs for packages that isnt installed locally but on some random vm? and how do know which packages to generate?

Well you need to generate some kind of list of packages that needs to be generated spanning all the VM`s but avoids duplicates. So lets throw ssh (or rather my old friend paramiko) at the problem and get a list of all new packages for each VM by parsing the emerge -NDpavu1 world output. Sound easy?

Yes, well kinda. The main problem by using the emerge output (which i heavily do in this script) is that when ever the output from emerge changes i am in a sub-optimal position. Though on the other hand i avoid having some python code on each VM that could interface directly with portage (since it is also written in python). Another little hitch (kinda expected) i ran into was, it can take a lot of time running emerge -NDpu1 world on each VM and add the found packages to a list and repeat those two steps for each VM.

Since i have done my fair share of reading into the whole GIL "issue" with python, I knew that threads was not the way to go, but processes would be the correct approach. But just for fun i tried using threads and it turns out it was much slower than starting a ssh (client) process per VM because of the GIL. There can be troublesome problems with sharing data across processes (need for locking etc). But since i do not really care which order the packages gets inserted in the queue and and the queue itself is multi process safe it is not a problem here. Furthermore i do not pop from the queue while the queue is getting populated, so It is all a matter of each process putting packages in the queue in a random order and then afterwards remove duplicates when all process are done (wait in the main process until all ssh processes have stopped).

Now we should have a nice clean list of packages that needs to be emerged on the fast vm. But you cant really build packages without building there dependencies right? Correct and we want an easy way to remove those dependencies afterwards. Cue the -o argument to portage aka build all dependencies (not on the system) for this package. While figuring out if there is any dependencies they (if any) is added to the depcleanList for later use when we want to remove the leftover cruft.

Next step is to generate the binaries them self which are now easily done since we know all the dependencies are in place. Que the -B argument to portage, the description of the argument is as following: Creates binary packages for all ebuilds processed without actually merging the packages. This comes with the caveat that all build-time dependencies must already be emerged on the system. So in other words generate binaries without installing the ebuilds on the local machine. Pretty awesome right.

Now all we need to do is to emerge all the freshly generate bins on the other vm`s and remove the dependencies on the builder. There is a few problems with the current code, such as paramiko dosent support the new ssh ecdsa public keys, so for now i just auto accepts the public keys every time i connect which opens up for a man in the middle attack. While it isnt that hard to get paramiko to read the ecdsa keys (which i have running in a local branch) the problem starts when you need to verify the local stored key against the current key the server uses (something to look into when i have some more time). Another problem is that not all leftover cruft gets removed, think configuration files and init scripts and the like are not remove by portage when you remove a package from the system. So expect a lot of leftover files...

Otherwise the code should work as expected (do not expect it to work!) and do not expect anything but a quick hack. I might cleanup the code and make it pretty sometime in the future, but i just needed something simple that did the job. You can find the code in the git repository under gentoo-updater.

Wednesday, August 10 2011

tinderbox frontend getting close and latest features

For the last few weeks i have been working on user watchlists and "notifications" (internally described as user messages) for the tinderbox frontend. Besides fixing/battling jquery, css and html while fixing random stuff (such as a diff view).

The idea is that a user (if registered) can add ebuild/packages or even whole categories to his/her watchlist and get notifications such as a new ebuild was added to the buildhost tree or a new buildlog is ready for review or even when a ebuild gets chucked out of the tree. For now these "messages" are only shown on the profile page but later it is planned to do mail notification or even rss (voice your opinion) or integrate it in the planed rest module (which only exists in my head still).
Generally speaking the main features as of now is:

  • Search through ebuilds to watch their logs,diff the ebuilds etc
  • Search through logs to review logs or download the logs/environment files(not done yet)
  • tinderbox information
    • watch the queues for each tinderbox
    • current make.conf
  • user profiles
    • add stuff to your watch list and get notifications
    • admins can register new users (not done yet)

The only extra feature you get by being registered for now is the notifications, but later on it is not hard to imagine the ability to alter the queue on the tinderboxes or 3. party submissions of logs through rest etc etc.
When all the above features are done and i have done some cleaning it will be deployed on a public server (within a week i think) and ill try to inform peoples about the project (i am not cool enough to be on the gentoo-planet).

Monday, April 11 2011

GSOC project dropped

..something that was overdue... It turns out that the GSOC project was dropped because the student became to busy. The tinderbox project as a whole might be better off, because the frontend code isnt in a "proper" state where it would be beneficial to start working on advanced features. Mainly because the basic infrastructure and features arent done just yet. Personally i would rather focus on getting the basic stuff done and done proper instead of trying to piggyback a gsoc proposal on the project.

Wednesday, April 6 2011

the tinderbox`s and the database

I have from time to time gotten some questions about what exactly is it that i do and how is that related to Zorry`s work, which tends to be more widely known. That could be cause by he is a dev and have sought (i assume) ideas and input from other gentoo-devs from time to time. To be fair there haven't been much motivation for me to put things out in the open and a little need to do so and even the little things i have shared have gotten wrongful shoot downs because the whole basic idea was misunderstood (these things happen when you don't talk to face to face and better yet have a freaking whiteboard at hand).

Lets take a step back and remember my last post. First of all Zorry had the idea and was even working on it before i joined (on and off for half a year). So i had little influence on the "basic" idea and the goals of these part of the project. So to be fair i am the wrong person to ask about those matters. What i will try to do is to explain how the tinderboxes interfaces with the database, though not in detail because i haven't read all tinderbox "client" code or tried to understand it or even worked on it. What happened back in the days, when i joined was that the database was a mess, so i suggested to redo it from scratch to suite Zorry`s needs. At this time the frontend was planned but the SQL database needed sorting first and the tinderbox-frontend was postponed, but in either case the frontend shouldn't dictate the layout of the database since it was to suite Zorry`s needs and what information he wanted to store, but just done proper.

the database and the local tinderbox`s

The database it self works as a central store of information first of all. The clients or the tinderbox`s can add remove and update data on the database. Such as submitting "build logs" discover new packages from the portage tree and add them to the database etc. Besides that each tinderbox haves a build queue which each tinderbox adds packages to and remove packages from when they are done building(failed or completed...) last piece of the puzzle is a bit of information about each tinderbox and its setup such as profiles in use etc. Sounds easy enough?

Well i left out some details the database actually stores much more information than what i described above because it models the portage tree. Therefor it also contains information about what useflags a certain packages haves and what useflags was enabled/disabled doing a build. Which keywords a certain package haves and what iuses, and if the package haves restrictions (such as fetch restriction) and which they are. To give an example you could ask the database give me all packages (or ebuilds for that matter) that haves the useflag X (that is in the database). Or give me all packages that haves fetch restrictions, iuses (enabled or disabled or neither) etc.

So the tinderboxes role (besides doing their main purpose: to build stuff!) is

1 update the database with new packages, update old packages/ebuilds if they have changed (useflags removed added etc).
2 use the database to store its queue (so it wont disappear on crashes, that also means that the queues is avail in the frontend).
3 keep the queues up to date by removing packages that have been build.
4 remove old packages from the database.
5 submit logs to the database, thats ruffly spoken since the database it self do not contains the logs them self but other kinds of information about how a package build went. 6 Zorry tell me if i have forgotten something XD

Monday, April 4 2011

The tinderbox project and my involvement.

My main motivation for using my spare time on this project at first was to 1: play more with django and 2: play more with database (yes i like playing with databases dark at night).

Continue reading...