Skip to content

Software Development News: .NET, Java, PHP, Ruby, Agile, Databases, SOA, JavaScript, Open Source

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Google Open Source Blog
Syndicate content
News about Google's Open Source projects and programs.
Updated: 9 hours 47 min ago

Google Code-in 2016: another record breaking year

Mon, 01/16/2017 - 22:18
Today we celebrate the closing of the 7th annual Google Code-in (GCI) which, like last year, was bigger and better than ever. Mentors from each of the 17 organizations are busy reviewing the last of the work submitted by student participants.

Each organization will pick two Grand Prize Winners who will receive a trip to Google’s Northern California headquarters this summer where they will meet Google engineers, see exciting demos and presentations and enjoy a day of adventure in San Francisco. You can learn about the experiences of the 2015 Grand Prize Winners in our short series of wrap-up blog posts. We’ll announce the new Grand Prize Winners and the Finalists here on January 30.

We would like to congratulate all of the new and returning students who participated this year. We’re thrilled with the turnout: over the last seven weeks, 1,374* students from 62 countries completed 6,397* tasks in the contest.

And a HUGE thanks to the people who are the heart of our program: the mentors and organization administrators. These volunteers spend countless hours creating and reviewing hundreds of tasks. They teach the young students who participate in GCI about the many facets of open source development, from community standards and communicating across time zones to version control and testing. We couldn’t run this program without you!

By Josh Simmons, Open Source Programs Office

* These numbers will increase over the coming days as mentors review the final work submitted by students.
Categories: Open Source

Introducing Draco: compression for 3D graphics

Fri, 01/13/2017 - 18:00
3D graphics are a fundamental part of many applications, including gaming, design and data visualization. As graphics processors and creation tools continue to improve, larger and more complex 3D models will become commonplace and help fuel new applications in immersive virtual reality (VR) and augmented reality (AR).  Because of this increased model complexity, storage and bandwidth requirements are forced to keep pace with the explosion of 3D data.

The Chrome Media team has created Draco, an open source compression library to improve the storage and transmission of 3D graphics. Draco can be used to compress meshes and point-cloud data. It also supports compressing points, connectivity information, texture coordinates, color information, normals and any other generic attributes associated with geometry.

With Draco, applications using 3D graphics can be significantly smaller without compromising visual fidelity. For users this means apps can now be downloaded faster, 3D graphics in the browser can load quicker, and VR and AR scenes can now be transmitted with a fraction of the bandwidth, rendered quickly and look fantastic.


Sample Draco compression ratios and encode/decode performance*
Transmitting 3D graphics for web-based applications is significantly faster using Draco’s JavaScript decoder, which can be tied to a 3D web viewer. The following video shows how efficient transmitting and decoding 3D objects in the browser can be - even over poor network connections.



Video and audio compression have shaped the internet over the past 10 years with streaming video and music on demand. With the emergence of VR and AR, on the web and on mobile (and the increasing proliferation of sensors like LIDAR) we will soon be swimming in a sea of geometric data. Compression technologies, like Draco, will play a critical role in ensuring these experiences are fast and accessible to anyone with an internet connection. More exciting developments are in store for Draco, including support for creating multiple levels of detail from a single model to further improve the speed of loading meshes.

We look forward to seeing what people do with Draco now that it's open source. Check out the code on GitHub and let us know what you think. Also available is a JavaScript decoder with examples on how to incorporate Draco into the three.js 3D viewer.

By Jamieson Brettle and Frank Galligan, Chrome Media Team

* Specifications: Tests ran with textures and positions quantized at 14-bit precision, normal vectors at 7-bit precision. Ran on a single-core of a 2013 MacBook Pro.  JavaScript decoded using Chrome 54 on Mac OS X.
Categories: Open Source

JanusGraph connects the past and future of Titan

Thu, 01/12/2017 - 19:04
We are thrilled to collaborate with a group of individuals and companies, including Expero, GRAKN.AI, Hortonworks and IBM, in launching a new project — JanusGraph — under The Linux Foundation to advance the state-of-the-art in distributed graph computation.



JanusGraph is a fork of the popular open source project Titan, originally released in 2012 by Aurelius, and subsequently acquired by DataStax. Titan has been widely adopted for large-scale distributed graph computation and many users have contributed to its ongoing development, which has slowed down as of late: there have been no Titan releases since the 1.0 release in September 2015, and the repository has seen no updates since June 2016.

This new project will reinvigorate development of the distributed graph system to add new functionality, improve performance and scalability, and maintain a variety of storage backends.

The name "Janus" comes from the name of a Roman god who looks simultaneously into the past to the Titans (divine beings from Greek mythology) as well as into the future.

All are welcome to participate in the JanusGraph project, whether by contributing features or bug fixes, filing feature requests and bugs, improving the documentation or helping shape the product roadmap through feature requests and use cases.

Get involved by taking a look at our website and browse the code on GitHub.

We look forward to hearing from you!

By Misha Brukman, Google Cloud Platform
Categories: Open Source

Apache Beam graduates to a top-level project

Tue, 01/10/2017 - 18:09
Please join me in extending a hearty digital “Huzzah!” to the Apache Beam community: as announced today, Apache Beam is an official graduate of the Apache Incubator and is now a full-fledged, top-level Apache project. This achievement is a direct reflection of the hard work the community has invested in transforming Beam into an open, professional and community-driven project.

11 months ago, Google and a number of partners donated a giant pile of code to the Apache Software Foundation, thus forming the incubating Beam project. The bulk of this code composed the Google Cloud Dataflow SDK: the libraries that developers used to write streaming and batch pipelines that ran on any supported execution engine. At the time, the main supported engine was Google’s Cloud Dataflow service with support for Apache Spark and Apache Flink in development); as of today there are five officially supported runners. Though there were many motivations behind the creation of Apache Beam, the one at the heart of everything was a desire to build an open and thriving community and ecosystem around this powerful model for data processing that so many of us at Google spent years refining. But taking a project with over a decade of engineering momentum behind it from within a single company and opening it to the world is no small feat. That’s why I feel today’s announcement is so meaningful.

With that context in mind, let’s look at some statistics squirreled away in the graduation maturity model assessment:

  • Out of the ~22 large modules in the codebase, at least 10 modules have been developed from scratch by the community, with little to no contribution from Google.
  • Since September, no single organization has had more than ~50% of the unique contributors per month.
  • The majority of new committers added during incubation came from outside Google.

And for good measure, here’s a quote from the Vice President of the Apache Incubator, lifted from the public Apache incubator general discussions list where Beam’s graduation was first proposed:

“In my day job as well as part of my work at Apache, I have been very impressed at the way that Google really understands how to work with open source communities like Apache. The Apache Beam project is a great example of this and is a great example of how to build a community." -- Ted Dunning, Vice President of Apache Incubator

The point I’m trying to make here is this: while Google’s commitment to Apache Beam remains as strong as it always has been, everyone involved (both within Google and without) has done an excellent job of building an open source project that’s truly open in the best sense of the word.
This is what makes open source software amazing: people coming together to build great, practical systems for everyone to use because the work is exciting, useful and relevant. This is the core reason I was so excited about us creating Apache Beam in the first place, the reason I’m proud to have played some small part in that journey, and the reason I’m so grateful for all the work the community has invested in making the project a reality.
Naturally, graduation is only one milestone in the lifetime of the project, and we have many more ahead of us, but becoming top-level project is an indication that Apache Beam now has a development community that is ready for prime time.
That means we’re ready to continue pushing forward the state of the art in stream and batch processing. We’re ready to bring the promise of portability to programmatic data processing, much in the way SQL has done so for declarative data analysis. We’re ready to build the things that never would have gotten built had this project stayed confined within the walls of Google. And last but perhaps not least, we’re ready to recoup the vast quantities of text space previously consumed by the mandatory “(incubating)” moniker accompanying all of our initial mentions of Apache Beam!
But seriously, whatever your motivation, please consider joining us along the way. We have an exciting road ahead.
By Tyler Akidau, Apache Beam PMC and Staff Software Engineer at Google
Categories: Open Source

Google Summer of Code 2016 blog post round-up

Tue, 01/10/2017 - 07:26
We’re publishing guest posts from Google Summer of Code (GSoC) students, mentors and organizations every week and more are coming. Many have already written GSoC wrap-up posts on their own blogs, so we’ve rounded them up for you to explore.


Static types in Python, oh my(py)!” by Tim Abbott, org admin for Zulip
“We posted mypy annotations as one of our project ideas for Google Summer of Code (GSoC). We found an incredible student, Eklavya Sharma, for the project. Eklavya did the vast majority of the hard work of annotating Zulip. Amazingly, he also found the time during the summer to migrate Zulip to use virtualenvs and then upgrade Zulip to Python 3!”


A road from Google Summer of Code student to organization administrator” by Araz Abishov, org admin for HISP
“Google has created unprecedented opportunity both for young developers and open source communities, which I think everyone should take advantage of. GSoC is more than just a three months internship, and I hope that this post will be a good example of how it can change anyone’s life.”


Summer of Code 2016: Wrapping it up” by Martin Braun, org admin for GNU Radio
“This summer was a great summer in terms of student participation. All three students will be presenting their work (either in person, or via poster) at this year’s GNU Radio Conference in Boulder, Colorado.”


2016 Google Summer of Code Wrap-Up” by Ed Cable, org admin for Mifos Initiative
“Each year GSoC continues to unite and grow our community in different ways. Once again, we received incredibly valuable contributions to our Mifos X web and mobile clients this summer; most importantly we have cultivated numerous passionate contributors that will be a part of our community long into the future.”


Road to GSoC 2016” by Minh Chu, student who worked on Neverland for KDE
“I was nervous about choosing a project. So many projects and requirements! After many hours, I finally decided to write a proposal for KDE’s Neverland Theme Builder and was accepted.”


Git Rev News” by Christian Couder, mentor for Git
“Such performance improvements as well as the code consolidations around the sequencer are of course very nice. It is interesting and satisfying to see that they are the result of building on top of previous work over the years by GSoC students, mentors and reviewers.”

Elasticsearch Lua II" by Dhaval Kapil, student who worked with LabLua
“My GSoC project this year was entitled ‘Improve elasticsearch-lua tests and builds’ and was a continuation of the work that I had done last year. Apart from adding a test suite for elasticsearch-lua and making it robust, I also decided to work on the documentation of the code."

Google Summer of Code 2016 Conclusion” by Amine Khaldi, org admin for ReactOS
“Students stumble upon many of the same difficulties ReactOS' own senior developers encountered during their early days, including that ever painful but necessary step to using a proper debugger instead of relying on printf statements in the code.”


My Journey in Open Source / How to Get Started Contributing” by Nelson Liu, student who worked on scikit-learn for PSF
“The best way to get started is to simply jump in! There are a myriad of ways to contribute to an open source project. Obviously, writing code to fix bugs, add new features, or enhance existing ones are useful. However, you don't have to write code to help out!”

Google Summer of Code 2016 Student Projects” by Pankaj Nathani, org admin for BuildmLearn
“Many open source projects like ours really benefit from this initiative of Google. Not only do we get large number of university students interested to work on our projects during summer; we also gain new long term contributors and project maintainers."

Lasp and the Google Summer of Code” by Borja o’Cook, student who worked on Lasp for BEAM Community
“All in all, it's been an amazing experience. I've received a lot of support from my mentors and teammates; the Lasp team is full of incredible people.”


GSoC 2016 Students in TEAMMATES” by Damith C. Rajapakse, org admin for TEAMMATES
“We had our biggest batch of students (7 students) in GSoC 2016, selected from 93 proposals, and representing 4 countries and 4 universities, working on TEAMMATES (an online feedback management system for education) and related sub projects.”


User-friendly encryption now in Drupal 8!” by Colan Schwartz, mentor for Drupal
“There were several students interested in the topic, and wrote proposals to match. Talha Paracha's excellent proposal was accepted, and he began in earnest. With Adam Bergstein (nerdstein) and I mentoring him, Talha successfully worked through all phases of the project.”


GSoC with Shogun” by Sanuj Sharma, student who worked on Shogun
“This was an excellent learning experience for me and I got to work with people from different countries (UK, Russia, Singapore, Germany) and cultures. I highly recommend students to participate in Google Summer of Code by looking for projects that interest them because having open source experience is highly beneficial, especially for programmers.”


We have wrap-up posts coming out every week so stay tuned for more. If you’re interested in participating in Google Summer of Code 2017, you can find details here.

By Josh Simmons, Open Source Programs Office
Categories: Open Source

Open source down under: Linux.conf.au 2017

Sun, 01/08/2017 - 22:53
It’s a new year and open source enthusiasts from around the globe are preparing to gather at the edge of the world for Linux.conf.au 2017. Among those preparing are Googlers, including some of us from the Open Source Programs Office.

This year Linux.conf.au is returning to Hobart, the riverside capital of Tasmania, home of Australia’s famous Tasmanian devils, running five days between January 16 and 20.
Circle_DevilTuz.pngTuz, a Tasmanian devil sporting a penguin beak, is the Linux.conf.au mascot.
(Artwork by Tania Walker licensed under CC BY-SA.)The conference, which began in 1999 and is community organized, is well equipped to explore the theme, "the Future of Open Source," which is reflected in the program schedule and miniconfs.

You’ll find Googlers speaking (listed below) as well as participating in the hallway track. Don’t miss our Birds of a Feather session if you’re a student, educator, project maintainer, or otherwise interested in talking about outreach and student programs like Google Summer of Code and Google Code-in.

Monday, January 16th
12:20pm The Sound of Silencing by Julien Goodwin
1:20pm   An Open Programming Environment Inspired by Programming Games by Josh Deprez

Tuesday, January 17th
All day    Community Leadership Summit X at LCA

Wednesday, January 18th
2:15pm   Community Building Beyond the Black Stump by Josh Simmons

Thursday, January 19th
4:35pm   Using Python for creating hardware to record FOSS conferences! by Tim Ansell

Friday, January 20th
1:20pm   Linux meets Kubernetes by Vishnu Kannan

Not able to make it to the conference? Keynotes and sessions will be livestreamed, and you can always find the session recordings online after the event.

We’ll see you there!

By Josh Simmons, Open Source Programs Office
Categories: Open Source

Google Summer of Code 2016 wrap-up: Oppia

Fri, 01/06/2017 - 18:00
Google Summer of Code (GSoC) is an annual program that encourages university students to become open source contributors. This guest post is part of a series of blog posts from the open source projects and organizations that participated in GSoC 2016.

The Oppia project makes it easy for anyone to create lightweight, interactive online lessons that simulate personal tutoring. These activities, called “explorations,” can be shared with others around the world as standalone tutorials (such as Programming with Carla and Quadratic Equations), or embedded in websites to supplement an existing course (such as “Take Your Medicine” on edX and Computational Thinking for Educators).

2016 was Oppia’s first year participating in GSoC and it was a blast! More students flocked to our ideas page than we had expected, and our Gitter channel was full of people saying hello and looking for starter projects. Over the course of the summer, with the help of two capable and enthusiastic students, we were able to bring the following new features to the Oppia codebase:

A new creator dashboard -- Avijit Gupta


An important principle of Oppia is that lessons can be easily improved over time -- it’s hard to figure out all the possible ways a student can go wrong at the outset, but it’s much easier to respond appropriately to a new misconception that arises.

Each creator on Oppia has a “creator dashboard” which allows them to see the lessons they’ve created, as well as the feedback they’ve received from learners. Avijit completed a full revamp of this page, updating its design (for both desktop and mobile) and finding ways to display all the necessary information in an intuitive way so that creators can easily improve their lessons while getting feedback on their teaching.

The new creator dashboard.
In addition, Avijit added functionality allowing creators to view student misconceptions that were not well-addressed, to make it easier for them to improve the feedback for those answers. He has continued to help out with the Oppia open source project as a maintainer and reviewer, even after GSoC, and is mentoring other contributors who are working on further improvements to the creator dashboard. You can read more about the project in his GSoC writeup!

Speed improvements -- Vishal Gupta


In order to improve the accessibility of lessons for students with poor internet connectivity, Vishal’s project aimed to make Oppia speedier and less bandwidth-intensive. He started by implementing a performance testing framework to benchmark his efforts, and also integrated it with our continuous integration system in order to protect against performance regressions. He then turned his efforts to caching as many static resources as possible, implementing a cache slug system that causes new files to be downloaded only after a new release is made.

In addition, Vishal removed JavaScript code that was inlined in the main templates, and refactored it out into an external script which could then be cached for better performance. You can read more about this project in his post on the Oppia blog.

We’d like to extend our grateful thanks not only to Avijit and Vishal, but also to our many willing and enthusiastic mentors, and to Google for supporting our open source work with GSoC.

Join us in helping improve educational opportunities for students around the world. If you’d like to subscribe to news and updates about Oppia’s participation in GSoC, you can sign up to the oppia-gsoc-announce mailing list -- or, if you’re already feeling enthusiastic, you can start helping out with the project right away!

By Ben Henning and Sean Lip, Organization Administrators for Oppia
Categories: Open Source

Grumpy: Go running Python!

Wed, 01/04/2017 - 18:00
Google runs millions of lines of Python code. The front-end server that drives youtube.com and YouTube’s APIs is primarily written in Python, and it serves millions of requests per second! YouTube’s front-end runs on CPython 2.7, so we’ve put a ton of work into improving the runtime and adapting our application to work optimally within it. These efforts have borne a lot of fruit over the years, but we always run up against the same issue: it's very difficult to make concurrent workloads perform well on CPython.

To solve this problem, we investigated a number of other Python runtimes. Each had trade-offs and none solved the concurrency problem without introducing other issues.
MeatGrinder.png
So we asked ourselves a crazy question: What if we were to implement an alternative runtime optimized for real-time serving? Once we started going down the rabbit hole, Go seemed like an obvious choice of platform since its operational characteristics align well with our use case (e.g. lightweight threads). We wanted first class language interoperability and Go’s powerful runtime type reflection system made this straightforward. Python in Go felt very natural, and so Grumpy was born.

Grumpy is an experimental Python runtime for Go. It translates Python code into Go programs, and those transpiled programs run seamlessly within the Go runtime. We needed to support a large existing Python codebase, so it was important to have a high degree of compatibility with CPython (quirks and all). The goal is for Grumpy to be a drop-in replacement runtime for any pure-Python project.

Two design choices we made had big consequences. First, we decided to forgo support for C extension modules. This means that Grumpy cannot leverage the wealth of existing Python C extensions but it gave us a lot of flexibility to design an API and object representation that scales for parallel workloads. In particular, Grumpy has no global interpreter lock, and it leverages Go’s garbage collection for object lifetime management instead of counting references. We think Grumpy has the potential to scale more gracefully than CPython for many real world workloads. Results from Grumpy’s synthetic Fibonacci benchmark demonstrate some of this potential:



Second, Grumpy is not an interpreter. Grumpy programs are compiled and linked just like any other Go program. The downside is less development and deployment flexibility, but it offers several advantages. For one, it creates optimization opportunities at compile time via static program analysis. But the biggest advantage is that interoperability with Go code becomes very powerful and straightforward: Grumpy programs can import Go packages just like Python modules! For example, the Python snippet below uses Go’s standard net/http package to start a simple server:

from __go__.net.http import ListenAndServe, RedirectHandler

handler = RedirectHandler('http://github.com/google/grumpy', 303)
ListenAndServe('127.0.0.1:8080', handler)

We’re excited about the prospects for Grumpy. Although it’s still alpha software, most of the language constructs and many core built-in types work like you’d expect. There are still holes to fill — many built-in types are missing methods and attributes, built-in functions are absent and the standard library is virtually empty. If you find things that you wish were working, file an issue so we know what to prioritize. Or better yet, submit a pull request.

Stay Grumpy!

By Dylan Trotter, YouTube Engineering
Categories: Open Source

Rails Girls Summer of Code: Changing the face of tech

Wed, 01/04/2017 - 16:50
This is a guest post from Laura Gaetano who organizes Rails Girls Summer of Code, a global fellowship program inspired by Google Summer of Code.

Have you seen that picture of Margaret Hamilton, the NASA engineer who worked on the computer systems for the Apollo 11 launch? She’s standing next to the human-sized pile of listings of the Apollo Guidance Computer source code that she worked on. Do you know about Ada Lovelace, often cited as the very first computer programmer?

From World War II until the 1980s, women engineers and women computer operators were fairly common. There was a steady rise in women entering STEM fields, and young girls had role models and strong women to look up to. We're well acquainted with the drop in female engineering graduates worldwide after this time period, and the subsequent drop in the percentage of women entering the world of tech. We're here to help change that, and reverse the trend.

Rails Girls Summer of Code (RGSoC) aims to bring more diversity into the world of tech — specifically, into the world of open source software, where women make up a mere 11% of the community. The global program offers 3-month scholarships to teams of women to allow them to work full-time on an open source project of their choice – aided by local coaches and guided by the project maintainer (or a core contributor). The scholarships are funded through the support of the community as well as our sponsors, via a crowdfunding campaign.

Local vs. Global
We all cherish our local community and understand how strong of a support network it can be, especially for newcomers. The Rails Girls chapters worldwide emphasize that need: most coaches and organisers are local, and many alums go on to create their own study groups, or become coaches or organisers themselves. RGSoC also relies strongly on a global network of user groups — both Rails Girls chapters and similar organisations such as PyLadies or DjangoGirls.

Thanks to our connections with these different groups, we are able to reach people in remote or unlikely locations, and build the most diverse group of applicants possible. This is very important to us. Since the beginning, the program has provided the opportunity to bring together women with different experiences, backgrounds, locales and age groups to come together and be part of the same global initiative.

Our Structure
Last year, we received over 90 team applications. When applying, each two-person team chooses from a list of pre-selected projects. These projects are maintained by people we either personally know, or who have reached out to us prior to the application period. We look for projects with patient, open-minded contributors who are active in their community, and projects that provide a lot of learning opportunities for applicants.

Project maintainers (also called mentors) are in touch with students in order to adapt the roadmap throughout the summer to the students' needs and check up on their progress. On a daily basis, students spend the majority of their time with coaches. The coaches help, support, and teach the students throughout the summer. Each team is also appointed a supervisor, who supports students on the organisational side of things. They are the glue that keeps the whole team together, and a way for the core RGSoC team to keep track of how every team is doing.

Our Stats
Our program started in 2013 with 18 teams, 10 of which were sponsored and 8 of which were volunteer teams. The following year, 16 teams participated with 10 sponsored spots. The real breakthrough came in 2015 when we were able to fund 16 sponsored teams, a substantial increase from the previous years. Not only did this enable us to have more impact — with a potential 12 more women entering the tech world and STEM workforce than the previous years — but it also shows the community’s trust in the program.

In 2016, the Ruby community awarded us with a Ruby Hero Award, and we managed to collect enough money to sponsor 16 teams from five continents with another 4 teams joining as volunteers. This year was also the first time we had teams based in Uganda, Egypt, Singapore and the Czech Republic.
Our stats from 2016 (Image: Laura Gaetano/RGSoC)In 2015, we contacted our alums from 2013 and 2014 to find out what they were doing after the program. The responses were impressive: out of 64 graduates, over 90% are now currently working in the tech field. A fair number of graduates have even founded their own startup. Not only that some of these women have found their calling, but we might have made a small difference in the community of open source, and are on the right track to really shake things up.

Where do we go from here
On the first of July last year, we kicked off our program with over 130 people participating — including coaches, supervisors, designers, helpdesk coaches and project mentors. We were incredibly excited to have 20 teams in 16 cities and 11 different countries, spanning time zones, from UTC+10 to UTC-7.
Our 2016 sponsored and volunteer teams! (Image: Ana Sofia Pinho/RGSoC)We’ve seen in the past just how much of an impact we’ve had in our participants’ lives, and are hoping that this trend will continue to rise. We hope that some of our previous editions' teams graduated with the skills and confidence to become NASA engineers, web developers, or anything else they want to be. Hopefully someday they will become a young woman’s role model, and realise the important role they served in changing the future of engineering and of open source software.

By Laura Gaetano, Organizer of Rails Girls Summer of Code
Categories: Open Source

Taking the pulse of Google Code-in 2016

Fri, 12/23/2016 - 18:00
GCI official horizontal_1372x448dp.png
Today is the official midpoint of this year’s Google Code-in contest and we are delighted to announce this is our most popular year ever! 930 teenagers from 60 countries have completed 3,503 tasks with 17 open source organizations. The number of students successfully completing tasks has almost met the total number of students from the 2015 contest already.

Tasks that the students have completed include:
  • writing test suites
  • improving mobile UI 
  • writing documentation and creating videos to help new users 
  • working on internationalization efforts
  • fixing and finding bugs in the organization's’ software 
Participants from all over the world
In total, over 2,800 students from 87 countries have registered for the contest and we look forward to seeing great work from these (and more!) students over the next few weeks. 2016 has also seen a huge increase in student participation in places such as Indonesia, Vietnam and the Philippines.

Google Code-in participants by country
Please welcome two new countries to the GCI family: Mauritius and Moldova! Mauritius made a very strong debut to the contest and currently has 13 registered students who have completed 31 tasks.
The top five countries with the most completed tasks are:
  1. India: 982
  2. United States: 801
  3. Singapore: 202
  4. Vietnam: 119
  5. Canada: 117
Students, there is still plenty of time to get started with Google Code-in. New tasks are being added daily to the contest site — there are over 1,500 tasks available for students to choose from right now! If you don’t see something that interests you today, check back again every couple of days for new tasks.

The last day to register for the contest and claim a task is Friday, January 13, 2017 with all work being due on Monday, January 16, 2017 at 9:00 am PT.

Good luck to all of the students participating this year in Google Code-in!

By Stephanie Taylor, Google Code-in Program Manager

All numbers reported as of 8:00 PM Pacific Time, December 22, 2016.
Categories: Open Source

Google Summer of Code 2016 wrap-up: Public Lab

Wed, 12/21/2016 - 18:00
This post is part of our series of guest posts from students, mentors and organization administrators who participated in Google Summer of Code 2016.


How we made this our best Google Summer of Code everThis was our fourth year doing Google Summer of Code (GSoC), and it was our best year ever by a wide margin! We had five hard-working students who contributed over 17,000 new lines of (very useful) code to our high-priority projects.

Students voluntarily started coding early and hit the ground running, with full development environments and a working knowledge of GitHub Flow-style pull request process. They communicated with one another and provided peer support. They wrote tests. Hundreds of them! They blogged about their work as they went, and chatted with other community members about how to design features.

All of that was amazing, and it was made better by the fact that we were accepting pull requests with new code twice weekly. Tuesdays and Fridays, I went through new submissions, provided feedback, and pulled new code into our master branch, usually publishing it to our production site once a week.

I don't know how other projects do things, but this was very new for us, and it's revolutionized how we work together. In past years, students would work on their forks, slowly building up features. Then in a mad dash at the end, we’d try to merge them into trunk, with lots of conflicts and many hours (weeks!) of work on the part of project maintainers.

Screenshot_2016-08-26_at_11.44.16_AM.pngWhat made this year so good?

Many things aligned to make this summer great, and basically none of them are our ideas. I'm sure plenty of you are cringing at how we used to do things, but I also don't think that it's that unusual for projects not "born" in the fast-paced world of modern code collaboration.

We used ideas and learned from Nicolas Bevacqua, author of JavaScript Application Design and of the woofmark and horsey libraries which I've contributed to. We've also learned a great deal from the Hoodie community, particularly Gregor Martynus, who we ran into at a BostonJS meetup. Lastly, we learned from SpinachCon, organized by Shauna Gordon McKeon and Deb Nicholson, where people refine their install process by actually going through the process while sitting next to each other.

Broadly, our strategies were:

  • Good documentation for newcomers (duh)
  • Short and sweet install process that you've tried yourself (thanks, SpinachCon!)
  • Predictable, regular merge schedule
  • Thorough test suite, and requiring tests with each pull request
  • Modularity, insisting that projects be broken into small, independently testable parts and merged as they’re written

Installation and pull requests

Most of the above sound kind of obvious or trivial, but we saw a lot of changes when we put it all together. Having a really fast install process, and guidance on getting it running in a completely consistent environment like the virtualized Cloud9 service, meant that many students were able to get the code running the same day they found the project. We aimed for an install time of 15 minutes max, and supplied a video of this for one of our codebases.

We also asked students to make a small change (even just add a space to a file) and walk through the GitHub Flow pull request (PR) submission process. We had clear step-by-step guidance for this, and we took it as a good sign when students were able to read through it and do this.

Importantly, we really tried to make each step welcoming, not demanding or dismissive, of folks who weren’t familiar with this process. This ultimately meant that all five students already knew the PR process when they began coding.

Twice-weekly merge schedule

We were concerned that, in past years, students only tried merging a few times and typically towards the end of the summer. This meant really big conflicts (with each other, often) and frustration.

This year we decided that, even though we’re a tiny organization with just one staff coder, we’d try merging on Tuesday and Friday mornings, and we mostly succeeded. Any code that wasn’t clearly presented, commits squashed, passing tests, and submitting new tests, was reviewed and I left friendly comments and requests so it could be merged the following week.

At first I felt bad rejecting PRs, but we had such great students that they got used to the strictness. They got really good at separating out features, demonstrating their features through clear tests, and some began submitting more than two PRs per week - always rebasing on top of the latest master to ensure a linear commit history. Sweet!

Wrap-up and next steps

The last thing we did was to ask each student, essentially as their documentation, to write a series of new issues which clearly described the problem and/or desired behavior, leave suggestions and links to specific lines of code or example code, and mark them with the special “help-wanted” tag which was so helpful to them when they first started out. We asked each to also make one extra-welcoming “first-timers-only” issue which walks a new contributor through every step of making a commit and even provides suggested code to be inserted.

This final requirement was key. While I personally made each of the initial set of “help-wanted” and “first-timers-only” issues before GSoC, now five students were offloading their unfinished to-dos as very readable and inviting issues for others. The effect was immediate, in part because these special tags are syndicated on some sites. Newcomers began picking them up within hours and our students were very helpful in guiding them through their first contributions to open source.

I want to thank everyone who made this past summer so great, from our champion mentors and community members, to our stellar students, to all our inspirations in this new process, to the dozen or so new contributors we’ve attracted since the end of August.

By Jeff Warren, Organization Administrator for PublicLab.org
Categories: Open Source

Google Summer of Code 2016 wrap-up: CSE@TU Wien

Wed, 12/14/2016 - 18:00
Every year over a thousand university students work with more than a hundred open source organizations as part of the Google Summer of Code (GSoC). This post is part of a series of guest posts from students, mentors and organization administrators reflecting on GSoC 2016.

CSE@TU Wien is a loose interest group at the Technische Universität Wien (TU Wien) focused on developing, providing and utilizing free and open source software for research. We’re an umbrella organization for several open source projects and we participate in Google Summer of Code (GSoC) to ensure that future generations continue building open source software for scientific computing.
We’ve participated in GSoC most years since 2011, and in 2016 we had ten successful projects. The thematic areas are -- befitting an engineering-focused university -- very diverse. Let’s take a look at the projects and what students accomplished:
Carbon Footprint for Google Maps is a browser extension that calculates CO2 emissions that users would incur by driving on routes suggested by popular mapping services and displays this information alongside time and distance. The aim is to raise awareness of the environmental impact of driving cars.
Kolya Opahle brilliantly re-factored the extension, making it much more modular. This enabled expansion to include other map services and port to other browsers, with browser-specific implementations reduced to a minimum. Building for specific browsers was made easy through a Gradle build script. He took on the Firefox port himself, which turned out to be more challenging than expected due to incompatibilities between the extension API’s of Firefox and Chrome. Overcoming this challenge required ingenuity. 
Prateek Gupta completely re-designed and reimplemented the extension’s user interface, optimizing the storage of user options and allowing localization. He added support for more mapping services and calculations of additional greenhouse gases. He added new features to give the user more information about greenhouse gas emissions, including: 
  • a page with air quality index using an API from the World Air Quality Index
  • a page with tips to reduce emissions; a calculator to compute CO2 absorption by trees
  • another calculator for the benefits of walking and cycling instead of driving
Chirag Arora ported the extension to the Safari web browser. Like the port to Firefox, this proved challenging due to discrepancies between the Chrome and Safari extension API’s. Chirag also implemented several new features, including: 
  • more unit systems in the options page
  • automatic configuration of fuel price based on location and the Global Petrol Prices API
  • approximate calculation of CO2 emissions for public transportation
The Colibri project focuses on smart building energy management. Intelligent control strategies are becoming more and more important for efficiently operating residential and commercial buildings, as buildings are responsible for a significant amount of global energy consumption.
Georg Faustmann implemented a connector for Open Automated Demand Response (OpenADR) networks. OpenADR information and signals can now be processed and stored in the Colibri data store. One challenge for this student was comprehensive handling of the OpenADR specification. Based on the specification, Georg identified a set of relevant use cases which were finally realized in this Colibri component.
Josef Wechselauer worked on a connector for gateways based on the OASIS Open Building Information Exchange (OBIX) standard. This connector links physical devices and data from building automation systems to Colibri. Josef was very enthusiastic and he implemented the connector with an additional graphical user interface for browsing through available OBIX objects. The system test with real hardware was challenging, but he solved all of the problems.
Pratyush Talreja implemented a connector that enables the integration of MATLAB Simulink simulations. More precisely, the connector links to the MATLAB environment and can read and write data over interfaces provided by the simulation. Pratyush had some initial troubles with the system design and the role of the connector in the overall system. However, he tackled those challenges and succeeded in the end.
Mind the Word is a browser extension that helps users learn a new language. It randomly translates a few words per sentence on websites as the user browsers. Since the user sees the translated words in context, they can infer its meaning and thus gradually learns new vocabulary with minimal effort. The extension uses Google, Microsoft and Yandex translation APIs.
Ankit Muchhala re-factored and modernized the code base to ES6 using JSPM, fixing critical bugs in the process and setting up a test environment in Karma and Jasmine. After that, he redesigned the user interface, making extensive use of Bootstrap 3 along with AngularJS. He also implemented various features to make the extension more usable, such as: 
  • dispersed word translation
  • (automatic) blacklisting and easy whitelisting of words and websites
  • and the ability to backup and restore the user's configurations
Rohan Katyal ported the extension to Firefox and implemented several new features, including: 
  • speech of translated words
  • generation of quizzes with the translated words
  • search for visual hints, similar words and usage examples, and more. 
R/sdcMicro is the state-of-the-art R package for data anonymization and is used by national and international institutions. Data privacy has become a hot topic in research and requires serious effort to ensure that individuals cannot be identified.
Probhonjon Baruah improved the code quality of sdcMicro. He wrote unit tests that should help other contributors keep the package consistent and free of bugs. The main challenge for the student was understanding the object-oriented implementation of sdcMicro that goes beyond typical R packages. The student learned that standardized tests are too general to be useful, and that more problem-oriented and specific tests are more effective.
Classilist is an open source visualization dashboard for probabilistic classification data.
Medha Katehara of LNMIIT India developed Classilist, an interactive system for visualizing the performance of probabilistic classifiers. Additionally, she developed plugins to pull classification data from machine learning frameworks such as RapidMiner, WEKA and R.
In conclusion, we are -- again -- very happy with Google Summer of Code. Students advanced themselves and our research software, a clear win-win. Our large team of experienced mentors performed well and we’re grateful for their continued dedication and the support of our university. We hope to participate again in 2017!
By Josef Weinbub and Florian Rudolf, Organization Administrators for TU Wien, Austria
Categories: Open Source

Open sourcing the Embedding Projector: a tool for visualizing high dimensional data

Mon, 12/12/2016 - 18:00
Originally posted on the Google Research Blog

Recent advances in machine learning (ML) have shown impressive results, with applications ranging from image recognition, language translation, medical diagnosis and more. With the widespread adoption of ML systems, it is increasingly important for research scientists to be able to explore how the data is being interpreted by the models. However, one of the main challenges in exploring this data is that it often has hundreds or even thousands of dimensions, requiring special tools to investigate the space.

To enable a more intuitive exploration process, we are open-sourcing the Embedding Projector, a web application for interactive visualization and analysis of high-dimensional data recently shown as an A.I. Experiment, as part of TensorFlow. We are also releasing a standalone version at projector.tensorflow.org, where users can visualize their high-dimensional data without the need to install and run TensorFlow.


Exploring Embeddings

The data needed to train machine learning systems comes in a form that computers don't immediately understand. To translate the things we understand naturally (e.g. words, sounds, or videos) to a form that the algorithms can process, we use embeddings, a mathematical vector representation that captures different facets (dimensions) of the data. For example, in this language embedding, similar words are mapped to points that are close to each other.
With the Embedding Projector, you can navigate through views of data in either a 2D or a 3D mode, zooming, rotating, and panning using natural click-and-drag gestures. Below is a figure showing the nearest points to the embedding for the word “important” after training a TensorFlow model using the word2vec tutorial. Clicking on any point (which represents the learned embedding for a given word) in this visualization, brings up a list of nearest points and distances, which shows which words the algorithm has learned to be semantically related. This type of interaction represents an important way in which one can explore how an algorithm is performing.


Methods of Dimensionality Reduction

The Embedding Projector offers three commonly used methods of data dimensionality reduction, which allow easier visualization of complex data: PCA, t-SNE and custom linear projections. PCA is often effective at exploring the internal structure of the embeddings, revealing the most influential dimensions in the data. t-SNE, on the other hand, is useful for exploring local neighborhoods and finding clusters, allowing developers to make sure that an embedding preserves the meaning in the data (e.g. in the MNIST dataset, seeing that the same digits are clustered together). Finally, custom linear projections can help discover meaningful "directions" in data sets - such as the distinction between a formal and casual tone in a language generation model - which would allow the design of more adaptable ML systems.

A custom linear projection of the 100 nearest points of "See attachments." onto the "yes" - "yeah" vector (“yes” is right, “yeah” is left) of a corpus of 35k frequently used phrases in emailsThe Embedding Projector website includes a few datasets to play with. We’ve also made it easy for users to publish and share their embeddings with others (just click on the “Publish” button on the left pane). It is our hope that the Embedding Projector will be a useful tool to help the research community explore and refine their ML applications, as well as enable anyone to better understand how ML algorithms interpret data. If you'd like to get the full details on the Embedding Projector, you can read the paper here. Have fun exploring the world of embeddings!
By Daniel Smilkov and the Big Picture group
Categories: Open Source

Google Summer of Code 2016 wrap-up: AOSSIE

Fri, 12/09/2016 - 18:00
We’re sharing guest posts from students, mentors and organization administrators who participated in Google Summer of Code (GSoC) 2016. This is the seventh post in the series.


AOSSIE (Australian Open Source Software Innovation and Education) is an organization created by the leaders of four research-oriented open source projects at the Australian National University. This was our first year in Google Summer of Code, but one of our projects had already participated three times as part of another organization.

We had 6 students and they surpassed our expectations. It was a great experience to mentor these students and provide them the opportunity to get involved in our cutting-edge research. We expect that their projects will lead to several publications and will be the starting point for long term collaborations.

Here are some highlights of their contributions:

Extempore is a programming language and runtime environment that supports live programming.

Joseph Penington adapted some cpp fluid dynamics code to show how live programming could be used to improve the workflow of scientific simulation. Joseph's project builds a series of increasingly complex fluid solvers in Extempore, allowing the programmer to make interesting and non-trivial changes to the simulation at runtime, including switching the way the fluids are solved in the middle of a simulation.

PriMedLink is software for matching similar patients in a way that preserves privacy (i.e. only using masked or encoded values of records without compromising privacy and confidentiality of patients) for health informatics applications such as clinical trials, advanced treatments and personalized patient care. The initial version of PPSPM software included masking and matching techniques for string, categorical and numerical (integer, floating point and modulus) data.

Mathu Mounasamy developed a module for PPSPM for masking and matching textual data which commonly occur in patient records (such as clinical notes and medical reports containing text data). The TextMM module developed by Mathu extends the functionality of PPSPM by allowing advanced privacy-preserving matching of similar patients based on various features containing textual data, thereby improving the quality and scope of PPSPM.

Rogas is a platform which integrates a collection of graph analysis tools and algorithms into a unified framework in order to support network analysis tasks.

Mojtaba Rezvani added the local community search (also known as local community detection) capability to Rogas. He has implemented several state-of-the-art algorithms proposed for local community detection, such as: k-core, k-truss, k-edge-connected, γ-quasi, and k-cliques. He has also designed a new algorithm for local community detection, which can efficiently identify local communities in large-scale networks.

Yan Xiao redesigned the GUI of Rogas in order to improve usability. He also implemented several visualization techniques to support the graph primitives of Rogas, including cluster, rank and path finding. These developments support dynamic network analysis at different scales so as to predict trends and patterns.

Skeptik is a Scala-based framework for proof theory and automated reasoning.

Ezequiel Postan generalized a challenging proof compression algorithm (the Split algorithm) from propositional logic to first-order logic and implemented it. This enables Skeptik to execute this algorithm not only on proofs output by SAT- and SMT-solvers but also on proofs output by resolution-based automated theorem provers. Ezequiel also implemented parsers for the TPTP and TSTP formats for theorem proving problems and proofs, and implemented a random proof generator to allow comprehensive experimental evaluation of the algorithms.

Daniyar Itegulov implemented a theorem prover for classical first-order logic using Skeptik's data structures and based on a novel logical calculus recently proposed by his mentor. This new calculus, called Conflict Resolution, is inspired by the propositional conflict-driven clause learning procedure used by SAT- and SMT-solvers and generalizes it to first-order logic. Daniyar also went further, conceiving and developing a concurrent proof search strategy for this calculus using Akka actors.

By Bruno Paleo, Organization Administrator for AOSSIE
Categories: Open Source

Open-sourcing DeepMind Lab

Wed, 12/07/2016 - 18:00
Originally posted on DeepMind Blog

DeepMind's scientific mission is to push the boundaries of AI, developing systems that can learn to solve any complex problem without needing to be taught how. To achieve this, we work from the premise that AI needs to be general. Agents should operate across a wide range of tasks and be able to automatically adapt to changing circumstances. That is, they should not be pre-programmed, but rather, able to learn automatically from their raw inputs and reward signals from the environment. There are two parts to this research program: (1)  designing ever-more intelligent agents capable of more-and-more sophisticated cognitive skills, and (2) building increasingly complex environments where agents can be trained and evaluated.

The development of innovative agents goes hand in hand with the careful design and implementation of rationally selected, flexible and well-maintained environments. To that end, we at DeepMind have invested considerable effort toward building rich simulated environments to serve as  “laboratories” for AI research. Now we are open-sourcing our flagship platform,  DeepMind Lab, so the broader research community can make use of it.

DeepMind Lab is a fully 3D game-like platform tailored for agent-based AI research. It is observed from a first-person viewpoint, through the eyes of the simulated agent. Scenes are rendered with rich science fiction-style visuals. The available actions allow agents to look around and move in 3D. The agent’s “body” is a floating orb. It levitates and moves by activating thrusters opposite its desired direction of movement, and it has a camera that moves around the main sphere as a ball-in-socket joint tracking the rotational look actions. Example tasks include collecting fruit, navigating in mazes, traversing dangerous passages while avoiding falling off cliffs, bouncing through space using launch pads to move between platforms, playing laser tag, and quickly learning and remembering random procedurally generated environments. An illustration of how agents in DeepMind Lab perceive and interact with the world can be seen below:

At each moment in time, agents observe the world as an image, in pixels, rendered from their own first-person perspective. They also may receive a reward (or punishment!) signal. The agent can activate its thrusters to move in 3D and can also rotate its viewpoint along both horizontal and vertical axes.

Artificial general intelligence research in DeepMind Lab emphasizes navigation, memory, 3D vision from a first person viewpoint, motor control, planning, strategy, time, and fully autonomous agents that must learn for themselves what tasks to perform by exploring their environment. All these factors make learning difficult. Each are considered frontier research questions in their own right. Putting them all together in one platform, as we have, represents a significant new challenge for the field.


DeepMind Lab is highly customisable and extendable. New levels can be authored with off-the-shelf editor tools. In addition, DeepMind Lab includes an interface for programmatic level-creation. Levels can be customised with gameplay logic, item pickups, custom observations, level restarts, reward schemes, in-game messages and more. The interface can be used to create levels in which novel map layouts are generated on the fly while an agent trains. These features are useful in, for example, testing how an agent copes with unfamiliar environments. Users will be able to add custom levels to the platform via GitHub. The assets will be hosted on GitHub alongside all the code, maps and level scripts. Our hope is that the community will help us shape and develop the platform going forward.



DeepMind Lab has been used internally at DeepMind for some time (example). We believe it has already had a significant impact on our thinking concerning numerous aspects of intelligence, both natural and artificial. However, our efforts so far have only barely scratched the surface of what is possible in DeepMind Lab. There are opportunities for significant contributions still to be made in a number of mostly still untouched research domains now available through DeepMind Lab, such as navigation, memory and exploration.

As well as facilitating agent evaluation, there are compelling reasons to think that it may be fundamentally easier to develop intelligence in a 3D world, observed from a first-person viewpoint, like DeepMind Lab. After all, the only known examples of general-purpose intelligence in the natural world arose from a combination of evolution, development, and learning, grounded in physics and the sensory apparatus of animals. It is possible that a large fraction of animal and human intelligence is a direct consequence of the richness of our environment, and unlikely to arise without it. Consider the alternative: if you or I had grown up in a world that looked like Space Invaders or Pac-Man, it doesn’t seem likely we would have achieved much general intelligence!

Read the full paper here.

Access DeepMind's GitHub repository here.

By Charlie Beattie, Joel Leibo, Stig Petersen and Shane Legg, DeepMind Team


Categories: Open Source

Why I contribute to Chromium

Mon, 12/05/2016 - 18:00
This is a guest post by Yoav Weiss who was recently recognized through the Google Open Source Peer Bonus Program for his work on the Chromium project. We invited Yoav to share about his work on our blog.

I was recently recognized by Google for my contributions to Chromium and wanted to write a few words on why I contribute to the project, other rendering engines and the web platform in general. I also wanted to share how it helped me evolve as a developer and why more people should contribute to the web platform for their own benefit.
The web platformI’ve written before about why I think the web platform is an extremely important asset for humanity and why we should make sure it'll thrive for years to come. It enables the distribution of knowledge to the corners of the earth and has fundamentally changed our world. Yet, compared to the amount of users (billions!) and web developers (millions), there are only a few hundred engineers working on maintaining and improving the platform itself.

That means that there are many aspects of the platform that are not as well maintained as they should be. We're at a real risk of a "tragedy of the commons" scenario, where despite usage and utility, the platform will collapse under its own weight because maintaining it is nobody's exclusive problem.
How I got startedPersonally, I had been working on web performance for well over a decade before I decided to get more involved and lend my hand in building the platform. For a large part of my professional life, browsers were black boxes. They were given to us by the browser gods and that's what we had to work with for the next few years. Their undocumented bugs and quirks became gospel, passed from senior engineers to their juniors.

Then at some point, that situation changed. Slowly but surely, open source browsers started picking up market share. No longer black boxes, we can actually see what happens on the inside!

I first got involved by joining the responsive images discussions and the Responsive Images Community Group. Then, I saw a tweet from RICG's chair calling to develop a prototype of the current proposal to prove its feasibility and value. And I jumped in.

I created a prototype using Chromium and WebKit, demoed it to anyone that was interested, worked on the proposals and argued the viability of the proposals' approach on the various mailing lists. Eventually, we were able to get some browser folks on board, improve the proposals and their fit to the rest of the platform, and I started working on an implementation.

The amount of work this required was larger than I expected. Eventually I managed to ship the feature in Blink and Chromium, and complete large parts of the implementation in WebKit as well. WOOT!
Success! Now what?After that project was done, I started looking into what I should do next. I was determined to continue working on browsers and find a gig that would let me do that. So I searched for an employer with a vested interest in the web and in making it faster, who would be happy to let me work on the platform's client - the web browser.

I found such an employer in Akamai, where I have been working as a Principal Architect ever since. As part of my job I'm working on our performance optimization features as well as performance-related browser features, making sure they make it into browsers in a timely fashion.
Why you should contribute, tooNow, chances are that if you're reading this, you're also relying on the web platform for your job in one way or another. Which means that there's a chance that it also makes sense for your organization to contribute to the web platform. Let’s explore the reasons:
1. Make sure work is done on features you care aboutIf you're like me, you love the web platform and the reach it provides you, but you're not necessarily happy with all of it. The web is great, but not perfect. Since browsers and web standards are no longer black boxes, you can help change that.

You can work on standards and browsers to change them to include your use-case. That's immense power at your fingertips: put in the work and the platform evolves for all the billions of users out there.

And you don’t have to wait years before new features can be used in production like with yesteryear's browser changes. With today’s browser update rates and progressive enhancement, you’ll probably be able to use changes in production within a few months.
2. Gain expertise that can help you do your job betterKnowing browser internals better can also give you superpowers in other parts of your job. Whenever questions about browser behavior arrive, you can take a peek into the source code and have concrete answers rather than speculation.

Keeping track of standards discussions give you visibility into new browser APIs that are coming along, so that you can opt to use those rather than settle for sub-optimal alternatives that are currently available.
3. Grow as an engineerWorking on browsers teaches you a lot about how things work under the surface and enables you to understand the internals of modern browsers, which are extremely complex machines. Further, this work allows you to get code reviews from the world's leading experts on these subjects. What better way to grow than to interact with the experts?
4. It's a fun and welcoming communityContributing to the web platform has been a great experience for me. Working with the Chromium project, in particular, is always great fun. The project is Google backed, but there are many external contributors and the majority of work and decisions are being done in the open. The people I've worked with are super friendly and happy to help. All in all, it's really fun!
Join usThe web needs more people working on it, and working on the web platform can be extremely beneficial to you, your career and your business.

If you're interested in getting started with web standards, the Discourse instance of the web Platform Incubator Community Group (or WICG for short) is where it's at (disclaimer: I'm co-chairing that group). For getting started with Chromium development, this is the post for you.

And most important, don't be afraid to ask the community. People on blink-dev and IRC are super friendly and will be happy to point you in the right direction.

So come on over and join the good cause. We'll be happy to have you!

By Yoav Weiss, Chromium contributor
Categories: Open Source

Announcing OSS-Fuzz: Continuous fuzzing for open source software

Thu, 12/01/2016 - 18:00
We are happy to announce OSS-Fuzz, a new Beta program developed over the past years with the Core Infrastructure Initiative community. This program will provide continuous fuzzing for select core open source software.

Open source software is the backbone of the many apps, sites, services, and networked things that make up “the internet.” It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.

Recent security stories confirm that errors like buffer overflow and use-after-free can have serious, widespread consequences when they occur in critical open source software. These errors are not only serious, but notoriously difficult to find via routine code audits, even for experienced developers. That's where fuzz testing comes in. By generating random inputs to a given program, fuzzing triggers and helps uncover errors quickly and thoroughly.

In recent years, several efficient general purpose fuzzing engines have been implemented (e.g. AFL and libFuzzer), and we use them to fuzz various components of the Chrome browser. These fuzzers, when combined with Sanitizers, can help find security vulnerabilities (e.g. buffer overflows, use-after-free, bad casts, integer overflows, etc), stability bugs (e.g. null dereferences, memory leaks, out-of-memory, assertion failures, etc) and sometimes even logical bugs.

OSS-Fuzz’s goal is to make common software infrastructure more secure and stable by combining modern fuzzing techniques with scalable distributed execution. OSS-Fuzz combines various fuzzing engines (initially, libFuzzer) with Sanitizers (initially, AddressSanitizer) and provides a massive distributed execution environment powered by ClusterFuzz.
Early successesOur initial trials with OSS-Fuzz have had good results. An example is the FreeType library, which is used on over a billion devices to display text (and which might even be rendering the characters you are reading now). It is important for FreeType to be stable and secure in an age when fonts are loaded over the Internet. Werner Lemberg, one of the FreeType developers, was an early adopter of OSS-Fuzz. Recently the FreeType fuzzer found a new heap buffer overflow only a few hours after the source change:

ERROR: AddressSanitizer: heap-buffer-overflow on address 0x615000000ffa
READ of size 2 at 0x615000000ffa thread T0
SCARINESS: 24 (2-byte-read-heap-buffer-overflow-far-from-bounds)
   #0 0x885e06 in tt_face_vary_cvtsrc/truetype/ttgxvar.c:1556:31

OSS-Fuzz automatically notified the maintainer, who fixed the bug; then OSS-Fuzz automatically confirmed the fix. All in one day! You can see the full list of fixed and disclosed bugs found by OSS-Fuzz so far.
Contributions and feedback are welcomeOSS-Fuzz has already found 150 bugs in several widely used open source projects (and churns ~4 trillion test cases a week). With your help, we can make fuzzing a standard part of open source development, and work with the broader community of developers and security testers to ensure that bugs in critical open source applications, libraries, and APIs are discovered and fixed. We believe that this approach to automated security testing will result in real improvements to the security and stability of open source software.

OSS-Fuzz is launching in Beta right now, and will be accepting suggestions for candidate open source projects. In order for a project to be accepted to OSS-Fuzz, it needs to have a large user base and/or be critical to Global IT infrastructure, a general heuristic that we are intentionally leaving open to interpretation at this early stage. See more details and instructions on how to apply here.

Once a project is signed up for OSS-Fuzz, it is automatically subject to the 90-day disclosure deadline for newly reported bugs in our tracker (see details here). This matches industry’s best practices and improves end-user security and stability by getting patches to users faster.

Help us ensure this program is truly serving the open source community and the internet which relies on this critical software, contribute and leave your feedback on GitHub.

By Mike Aizatsky, Kostya Serebryany (Software Engineers, Dynamic Tools); Oliver Chang, Abhishek Arya (Security Engineers, Google Chrome); and Meredith Whittaker (Open Research Lead).
Categories: Open Source

Docker + Dataflow = happier workflows

Wed, 11/30/2016 - 18:00
When I first saw the Google Cloud Dataflow monitoring UI -- with its visual flow execution graph that updates as your job runs, and convenient links to the log messages -- the idea came to me. What if I could take that UI, and use it for something it was never built for? Could it be connected with open source projects aimed at promoting reproducible scientific analysis, like Common Workflow Language (CWL) or Workflow Definition Language (WDL)?
Screenshot of a Dockerflow workflow for DNA sequence analysis.
In scientific computing, it’s really common to submit jobs to a local high-performance computing (HPC) cluster. There are tools to do that in the cloud, like Elasticluster and Starcluster. They replicate the local way of doing things, which means they require a bunch of infrastructure setup and management that the university IT department would otherwise do. Even after you’re set up, you still have to ssh into the cluster to do anything. And then there are a million different choices for workflow managers, each unsatisfactory in its own special way.

By day, I’m a product manager. I hadn’t done any serious coding in a few years. But I figured it shouldn’t be that hard to create a proof-of-concept, just to show that the Apache Beam API that Dataflow implements can be used for running scientific workflows. Now, Dataflow was created for a different purpose, namely, to support scalable data-parallel processing, like transforming giant data sets, or computing summary statistics, or indexing web pages. To use Dataflow for scientific workflows would require wrapping up shell steps that launch VMs, run some code, and shuttle data back and forth from an object store. It should be easy, right?

It wasn’t so bad. Over the weekend, I downloaded the Dataflow SDK, ran the wordcount examples, and started modifying. I had a “Hello, world” proof-of-concept in a day.

To really run scientific workflows would require more, of course. Varying VM shapes, a way to pass parameters from one step to the next, graph definition, scattering and gathering, retries. So I shifted into prototyping mode.

I created a new GitHub project called Dockerflow. With Dockerflow, workflows can be defined in YAML files. They can also be written in pretty compact Java code. You can run a batch of workflows at once by providing a CSV file with one row per workflow to define the parameters.

Dataflow and Docker complement each other nicely:

  • Dataflow provides a fully managed service with a nice monitoring interface, retries,  graph optimization and other niceties.
  • Docker provides portability of the tools themselves, and there's a large library of packaged tools already available as Docker images.

While Dockerflow supports a simple YAML workflow definition, a similar approach could be taken to implement a runner for one of the open standards like CWL or WDL.

To get a sense of working with Dockerflow, here’s “Hello, World” written in YAML:

defn:
  name: HelloWorkflow
steps:
- defn:
    name: Hello
    inputParameters:
      name: message
      defaultValue: Hello, World!
    docker:
      imageName: ubuntu
      cmd: echo $message

And here’s the same example written in Java:

public class HelloWorkflow implements WorkflowDefn {
  @Override
  public Workflow createWorkflow(String[] args) throws IOException {
    Task hello =
        TaskBuilder.named("Hello").input("message", “Hello, World!”).docker(“ubuntu”).script("echo $message").build();
    return TaskBuilder.named("HelloWorkflow").steps(hello).args(args).build();
  }
}

Dockerflow is just a prototype at this stage, though it can run real workflows and includes many nice features, like dry runs, resuming failed runs from mid-workflow, and, of course, the nice UI. It uses Cloud Dataflow in a way that was never intended -- to run scientific batch workflows rather than large-scale data-parallel workloads. I wish I’d written it in Python rather than Java. The Dataflow Python SDK wasn’t quite as mature when I started.

Which is all to say, it’s been a great 20% project, and the future really depends on whether it solves a problem people have, and if others are interested in improving on it. We welcome your contributions and comments! How do you run and monitor scientific workflows today?

By Jonathan Bingham, Google Genomics and Verily Life Sciences
Categories: Open Source

Google Summer of Code 2016 wrap-up: STE||AR

Tue, 11/29/2016 - 18:00
This is part of a series of guest posts from students, mentors and organization administrators who participated in Google Summer of Code (GSoC) 2016. GSoC is an annual program which pairs university students with mentors to work on open source software.


This summer the STE||AR Group was proud to mentor four students through Google Summer of Code. These students worked on a variety of projects which helped improve our software, HPX. This library is a distributed C++ runtime system which supports a standards compliant API and helps users scale their applications across thousands of machines.

The improvements to the code base will help our team and users of HPX around the world. A summary of our students’ projects:

Parsa Amini – HPX Debugger

Developing a better distributed debugging tool is essential to increase the programmability of HPX. Parsa’s project, Scimitar, aims to facilitate the debugging process for HPX programmers by extending the features of GDB, an existing debugger. The project then complements it with new commands for easier switching between localities across clusters, HPX thread debugging, awareness of internal HPX data structures, and semi-automated preparation for distributed debugging sessions. Additional functionality such as locating an object and viewing the queue information on each core is provided through using API provided by HPX itself. His work can be found on GitHub.

Aalekh Nigam – Implement a Map/Reduce Framework

This project aimed to expose a Map/Reduce programming model over HPX. During the summer, Aalekh was able to develop a single node implementation of HPXflow (map/reduce programming model) and laid the groundwork for the further multi-node version with database support. Although the initial task was limited to implementing the Map/Reduce model, he was also able to implement an improved dataflow model as well.

Minh-Khanh Do - Working on Parallel Algorithms for HPX::Vector

Minh-Khanh’s task was to take the parallel algorithms and add the functionality required to work on the segmented hpx::vector. Under his mentor John Biddscombe, he implemented the segmented_fill algorithm, which was successfully merged into the main codebase. Additionally, Minh-Khanh implemented the segmented_scan algorithm which includes inclusive and exclusive_scan. These changes are included in a pull request and have been merged. Using the segmented scan algorithm it is possible to perform tasks such as evaluating polynomials and to implement other algorithms such as quicksort.

Satyaki Upadhyay - Plugin Mechanism for thread schedulers in HPX

In HPX, schedulers are statically linked and must be built at compile-time. Satyaki’s project involved converting this statically linked scheme into a plugin system which would allow arbitrary schedulers to be dynamically loaded. These changes bring several benefits. They provide a layer of abstraction and follow the open/closed principle of software design as well as allowing developers to write their own custom schedulers while conforming to a uniform API. The project proceeded in two steps. The first involved the creation of plugin modules of the schedulers and registering them with HPX. The second step was to implement the loading and subsequent use of the chosen scheduler.

We would like to thank our students and mentors for the time that they have contributed to HPX this summer. In addition, we would like to thank Google for the opportunity that they provided the STE||AR Group to work with developers around the globe as well as the ability for students to interact with vibrant open source projects worldwide.

By Adrian Serio, Organization Administrator for The STE||AR Group
Categories: Open Source

It’s that time again: Google Code-in starts today!

Mon, 11/28/2016 - 21:18
Today marks the start of the 7th year of Google Code-in (GCI), our pre-university contest introducing students to open source development. GCI takes place entirely online and is open to students between the ages of 13 and 17 around the globe.
The concept is simple: complete bite-sized tasks (at your own pace) created by 17 participating open source organizations on topic areas you find interesting:

  • Coding
  • Documentation/Training
  • Outreach/Research
  • Quality Assurance
  • User Interface

Tasks take an average of 3-5 hours to complete and include the guidance of a mentor to help along the way. Complete one task? Get a digital certificate. Three tasks? Get a sweet Google t-shirt. Finalists get a hoodie. Grand Prize winners get a trip to Google headquarters in California.

Over the last 6 years, 3213 students from 99 countries have successfully completed tasks in GCI. Intrigued? Learn more about GCI by checking out our rules and FAQs. And please visit our contest site and read the Getting Started Guide.

Teachers, if you are interested in getting your students involved in Google Code-in you can find resources here to help you get started.

By Mary Radomile, Open Source Programs Office
Categories: Open Source