Skip to content

Software Development News: .NET, Java, PHP, Ruby, Agile, Databases, SOA, JavaScript, Open Source

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Google Open Source Blog
Syndicate content
News about Google's Open Source projects and programs.
Updated: 1 hour 59 min ago

Stories from Google Code-in: OpenMRS and SCoRe

Mon, 08/22/2016 - 18:00
Google Code-in is our annual contest that gives students age 13 to 17 experience in computer science through contributions to open source projects. This blog post is the third installment in our series reflecting on the experiences of Google Code-in 2015 grand prize winners. Be sure to check out the first and second posts in the series, too.

In this post we look at the stories of three more Google Code-in (GCI) grand prize winners. Our grand prize winners come from a pool of 980 students from 65 countries who, all told, completed 4,776 tasks for 14 open source projects.

We were lucky enough to host many of these extraordinary young coders at Google HQ for a few days this summer. Over that time, we learned more about where they came from, what they gained by participating in GCI and what they plan to do as new members of the open source community.

Google Code-in 2015 Grand Prize Winners explore the SF Bay Area in this immersive Google Street View display with fellow open source program managers Stephanie Taylor and Cat Allman who run GCI.Our first story today is that of Břetislav Hájek from the Czech Republic, who chose to work with the OpenMRS project because he sees their work as important. OpenMRS is an open source medical record system that improves healthcare delivery in resource-constrained regions.

Břetislav got into computer science through web development, so he started by working on tasks related to HTML and CSS. This gave him confidence to take on more challenging tasks. His favorite task was creating a web application for searching through patients. While he didn’t find it hard, he learned a lot and was proud to have made something useful. Reflecting on Google Code-in, Břetislav said: “That's the thing I like about GCI. I always treat tasks as opportunities to learn something new. And the learning is more entertaining since I work on real problems.”

IRC communication proved to be an important part of Břetislav’s success. Other students were there and tried to help each other out as best they could, and there were always mentors available to help guide them. He enjoyed the friendly environment. The community motivated him to work harder and try new things. In the end, Břetislav was glad to have participated and is motivated to continue his work.

Next we have Vicente Bermudez from Uruguay who discovered Google Code-in through a story in the local news celebrating a Uruguayan grand prize winner from a previous year. Like Břetislav, Vicente chose to work on the OpenMRS project because the cause spoke to him.

He got into programming through his love of video games and his desire to create his own. He hadn’t heard about programming before but initial research piqued his interest. Following his curiosity, he learned Java and expanded his knowledge from there. Conveniently, much of OpenMRS is built with Java!

The task-based structure worked well for Vicente. He was unsure of some tasks, recognizing that he didn’t know much about what they required. For instance, he hesitated to take on one that involved creating a Windows Phone app because he had never created a mobile app. But he persisted and, five days later, he had completed it and learned a lot about mobile development.

It surprised Vicente how much he learned in such a short time span. He had this to say: “During the contest I gained knowledge in a variety of fields such as programming, testing, video editing, and graphic design. The mentors encouraged us to think about quality instead of quantity, and I learned a lot from that.”

Vicente loved his Google Code-in experience and plans to continue contributing to open source projects, especially OpenMRS.

The last student story we’ll share today is that of Anesu Mafuvadze, a student from the US who worked with the Sustainable Computing Research Group (SCoRe). His introduction to computer science came through robotics in one of his high school classes which used a language similar to C++.

Anesu was thrilled by the experience of bringing the robots to life with code. He described his introduction this way: “The more I programmed the more captivated I became; I loved how easily I could convert my wildest ideas into fully functioning programs; I loved the thrill of working in an environment that demands minute precision; above all, I loved creating programs that other people found useful.”

Online documentation and YouTube tutorials fueled Anesu’s education for several years as he picked up multiple languages and began participating in programming contests. But he knew something was missing, Anesu lacked real world coding experience and had never collaborated with others. As such, he didn’t pay much attention to the readability of his code, wasn’t aware of version control, didn’t write extensive tests and had never built something for the common good.

Enter Google Code-in. Working with mentors helped Anesu deliver quality and building open source software required him to learn collaboration tools and value readability. The contest also gave him an opportunity to build on skills that he hadn’t developed, such as web development. Anesu says the experience made him a better programmer and that the introduction to open source has motivated him to use his skills on projects that benefit society.

Thank you to Břetislav, Vicente and Anesu for their hard work contributing to open source projects and for sharing their stories with us. We have one more blog post coming with more student stories so stay tuned!

By Josh Simmons, Open Source Programs Office
Categories: Open Source

Opening up Science Journal

Fri, 08/19/2016 - 17:56
Science Journal is an app that turns your Android phone into a mobile science tool, allowing you to use the sensors in your phone to explore the world around you. The Making & Science team launched Science Journal a few months ago at Bay Area Maker Faire 2016 and have been excited to see different projects people have done with it all over the world!

Today we are happy to announce that we are releasing Science Journal 1.1 on the Google Play Store and also publishing the core source for the app. Open source software and hardware has been hugely beneficial to the science education ecosystem. By open sourcing, we’ll be able to improve the app faster and also to provide the community with an example of a modern Android app built with Material Design principles.

One important feature in Science Journal is the ability to connect to external devices over Bluetooth LE. We have open source firmware which runs on several Arduino microcontrollers already. In the near future, we will provide alternate ways to get your sensor data into Science Journal: stay tuned (or follow along with our commits)!

We believe that anyone can be a scientist anywhere. Science doesn’t just happen in the classroom or lab. Tools like Science Journal let you see how the world works with just your phone and now you can explore how Science Journal itself works, too. Give it a try and let us know what you think!

By Justin Koh, Software Engineer
Categories: Open Source

From Google Summer of Code to Game of Thrones on the Back of a JavaScript Dragon (Part 2)

Wed, 08/17/2016 - 20:08
This guest post is a part of a short series about Guy Yachdav, Tatyana Goldberg and Christian Dallago and the journey that was inspired by their participation as Google Summer of Code mentors for the BioJS project. Don’t miss the first post in the series. Heads up, this post contains spoilers for Game of Thrones seasons 5 and 6!

We built on the Google Summer of Code (GSoC) philosophy and the lessons we learned from participating in 2014 by starting a JavaScript Technology class at the Technical University of Munich (TUM).
We began with two dozen students who worked on expanding the BioJS visualization library. Our class became popular quickly and the number of applicants doubled each semester (nearly 180 applicants for 40 seats in the 2016 summer term).
In 2016 our team grew to include Christian Dallago, who had joined as a GSoC mentor. Together we decided to break with tradition of our course’s previous semesters. Instead of focusing on data visualization, we wanted to introduce students to data science with JavaScript. To get our students fully engaged, we decided the project would center on data from the hit TV show, Game of Thrones.
Our aim was to create an online portal for Game of Thrones fans which would:
  1. Provide the most comprehensive, structured and open data set about the Game of Thrones world accessible via API.
  2. Present an interactive map based on JavaScript.
  3. Listen to what people are saying on Twitter about each of the show’s characters.
  4. Use machine learning algorithms to predict the likelihood of each character’s death.
Our plan worked — the students were engaged. It was a beautiful sight to see: GitHub repos humming with activity as each dev team delved deeper into their projects. As a project manager, you know you’ve got something good when issues are being opened and closed at 4:00 AM!
The results were mind blowing. In 50 days of programming, 36 students opened over 1,200 issues and pull requests, pushed 3,300 commits, released four apps to NPM, and, of course, produced one absolutely amazing website.
The website amasses data from 2,028 characters. Our map shows 240 landmarks and the paths traveled by 28 characters. Our Twitter sentiment analysis tool analyzed over 3 million tweets. And we launched the first ever machine learning-based prediction algorithm that predicts the likelihood of dying for the 1,451 characters in the show that are still alive.
image02fix.pngVisualization of Twitter sentiment analysis data for Jon Snow during season 5 of Game of Thrones. The X axis shows the timeline and the Y axis shows the number of positive (green) and negative (red) tweets. Each tweet is analyzed by an algorithm using a neural network to determine whether the tweet’s writer has a positive, negative or neutral attitude toward the character. Since launch, the site’s popularity has skyrocketed. Following our press release, we were covered by over 1,500 media outlets, most notably Time, The GuardianRolling Stone, Daily Mail, BBC, Reuters, The Telegraph, CNET and many more. HowStuffWorks, The Vulture and others produced videos about the site and Chris Hardwick’s Comedy Central show did a segment about us. We've also given countless interviews to TV, radio and newspapers.
Blog2_Figure1_v3.pngGoogle Analytics for the website. Left chart shows the number of visitors to the website during the first week after launch, reaching over 73K visitors on April 25th. Right chart shows the number of visitors at a given time point during the same week.The most exciting part of the project was predicting the likelihood that any given character would die using machine learning. Machine learning algorithms find rules and patterns in the data, things that humans cannot obviously and simply detect. Once the rules and patterns are identified, we apply machine learning to make inferences or predictions from novel, previously unseen, data sets.
Warning: The next paragraphs contain spoilers for seasons 5 and 6 of Game of Thrones!
In order to predict the likelihood of a character’s death, we collected information about all of the characters that appeared in books 1 to 5 and analyzed over 30 features, including age, gender, marital status and others. Then we used a support vector machine (SVM) to statistically compare the features of characters, both dead and alive, to predict who would get the axe next. Our prediction was correct for 74% of all cases and surprised us by placing a number of characters thought to be relatively safe in grave danger.
According to our predictions, Jon Snow, who was seemingly betrayed and murdered by fellow members of the Night’s Watch at the end of season 5, had only an 11% chance of dying. Indeed, Jon has risen from the dead in the second episode of season 6! We also predicted that the rulers of Dorn (Doran and Trystane) Martell are at a high likelihood of death and, as predicted, they were taken out in the first episode of the new season.
Of course, as is always the case with predictions, there were also misses. We didn’t expect Roose Bolton to be killed off nor did we see Hodor’s departure coming.
This experience was an amazing ride for our team and it all started with Google Summer of Code! In the next post we’ll share what followed and where we see ourselves heading in the future.
By Guy Yachdav, Tatyana Goldberg and Christian Dallago, BioJS
Categories: Open Source

From Google Summer of Code to Game of Thrones on the Back of a JavaScript Dragon (Part 1)

Wed, 08/17/2016 - 20:08
This guest post is a part of a short series about Tatyana Goldberg and Guy Yachdav, instructors at Technical University of Munich, and the journey that was inspired by their participation as Google Summer of Code mentors for the BioJS project.

Hello there! We are from the BioJavaScript (BioJS) project which first joined Google Summer of Code (GSoC) in 2014. Our experience in the program set us on a grand open source adventure that we’ll be sharing with you in a series of blog posts. We hope you enjoy our story and, more importantly, hope it inspires you to pursue your own open source adventure.
Tatyana Goldberg and Guy Yachdav, GSoC mentors and open source enthusiasts. Photo taken at the MorpheusCup competition Luxembourg, May 2016.We came together around the BioJS community, an open source project for creating beautiful and interactive open source visualizations of biological data on the web. BioJS visualizations are made up of components which have a modular design. This modular design enables several things: they can be used by non-programmers, they can be combined to make more complex visualizations, and they can be easily integrated into existing web applications. Despite being a young community, BioJS already has traction in industry and academia.
In early 2014 we decided to apply for GSoC and we were fortunate to have our application accepted on our first try. The experience was extremely positive — the five students we accepted delivered great software and they had a big impact on the BioJS community:
  • The number of mailing list subscribers doubled in less than a month.
  • All five of our accepted students from 2014 became core developers.
  • Students were invited to six international conferences to share their work.
  • Students helped organize the first BioJS conference held July 2015.
  • Most importantly, the students have independently designed BioJS version 2.0 which positioned BioJS as the leading open source visualization library for biological data. 
You can see three examples of the work GSoC students did on BioJS below:

MSAViewer is a visualization and analysis of multiple sequence alignments and was developed by Sebastian Wilzbach. Proteome Viewer is a multilevel visualization of proteomes in the UniProt database and was developed by Jose Villaveces. Genetic Variation Viewer is visualization of the number and type of mutations at each position in a biological sequence and was developed by Saket Choudhary.
We learned a lot in the first year we participated in Google Summer of Code. Here are some of the takeaways that are especially relevant to mentors and organizations that are considering joining the program:
  1. GSoC is a great source of dedicated and enthusiastic young developers.
  2. Mentors need to carefully manage students, listen to them and let them lead initiatives when it makes sense.
  3. Org admins should leverage success in GSoC beyond the program.
  4. Orgs need to find the most motivated students and make sure their projects are feasible.
  5. People want to share in your success, so participation in GSoC can start a positive feedback loop attracting new contributors and users.
  6. Most importantly: the ideas behind GSoC - the love for open source and coding - are contagious and spread easily to larger audiences, especially to students and other people who work in academia. Just try it! 
Our positive experience spurred us to seek out and conquer new challenges. Stay tuned for our next post where we explain how GSoC inspired us to create a popular new class and how we applied data science to Game of Thrones.
By Tatyana Goldberg and Guy Yachdav, BioJS and TU Munich
Categories: Open Source

A Google Santa Tracker update from Santa's Elves

Wed, 08/17/2016 - 18:00

Originally posted on the Google Developers Blog

By Sam Thorogood, Developer Programs Engineer


Today, we're announcing that the open source version of Google's Santa Tracker has been updated with the Android and web experiences that ran in December 2015. We extended, enhanced and upgraded our code, and you can see how we used our developer products - including Firebase and Polymer - to build a fun, educational and engaging experience.


To get started, you can check out the code on GitHub at google/santa-tracker-weband google/santa-tracker-android. Both repositories include instructions so you can build your own version.
Santa Tracker isn’t just about watching Santa’s progress as he delivers presents on December 24. Visitors can also have fun with the winter-inspired experiences, games and educational content by exploring Santa's Village while Santa prepares for his big journey throughout the holidays.
Below is a summary of what we’ve released as open source.
Android app
  • The Santa Tracker Android app is a single APK, supporting all devices, such as phones, tablets and TVs, running Ice Cream Sandwich (4.0) and up. The source code for the app can be found here.
  • Santa Tracker leverages Firebase features, including Remote Config API, App Invites to invite your friends to play along, and Firebase Analytics to help our elves better understand users of the app.
  • Santa’s Village is a launcher for videos, games and the tracker that responds well to multiple devices such as phones and tablets. There's even an alternative launcher based on the Leanback user interface for Android TVs.


  • Games on Santa Tracker Android are built using many technologies such as JBox2D (gumball game), Android view hierarchy (memory match game) and OpenGL with special rendering engine (jetpack game). We've also included a holiday-themed variation of Pie Noon, a fun game that works on Android TV, your phone, and inside Google Cardboard's VR.
Android Wear

  • The custom watch faces on Android Wear provide a personalized touch. Having Santa or one of his friendly elves tell the time brings a smile to all. Building custom watch faces is a lot of fun but providing a performant, battery friendly watch face requires certain considerations. The watch face source code can be found here.
  • Santa Tracker uses notifications to let users know when Santa has started his journey. The notifications are further enhanced to provide a great experience on wearables using custom backgrounds and actions that deep link into the app.
On the web

  • Santa Tracker is mobile-first: this year's experience was built for the mobile web, including an amazing brand new, interactive - yet fully responsive, village: with three breakpoints, touch gesture support and support for the Web App Manifest.
  • To help us develop Santa at scale, we've upgraded to Polymer 1.0+. Santa Tracker's use of Polymer demonstrates how easy it is to package code into reusable components. Every housein Santa's Village is a custom element, only loaded when needed, minimizing the startup cost of Santa Tracker.


  • Many of the amazing new games (like Present Bounce) were built with the latest JavaScript standards (ES6) and are compiled to support older browsers via the Google Closure Compiler.
  • Santa Tracker's interactive and fun experience is enhanced using the Web Animations API, a standardized JavaScript APIfor unifying animated content.
  • We simplified the Chromecast support this year, focusing on a great screensaver that would countdown to the big event on December 24th - and occasionally autoplay some of the great video content from around Santa's Village.
We hope that this update inspires you to make your own magical experiences based on all the interesting and exciting components that came together to make Santa Tracker!
Categories: Open Source

Which languages convey the most information in the least space? Introducing the Unimorph dataset.

Mon, 08/08/2016 - 18:00
Several years ago a science journalist asked me which languages could pack the most information into a 140-character Tweet. Because Twitter defines a character roughly as a single Unicode code point, this turns out to be an easy question to answer. Chinese almost certainly rates as the most “compact” language from that point of view because a single Chinese character represents a whole morpheme (in linguist terminology, a minimal unit of meaning) whereas an English letter only represents a part of a morpheme. The Chinese equivalent of I don’t eat meat, which in English takes 16 characters including spaces is 我不吃肉, which takes just four.

But this question relates to a broader question that as a linguist I have often been asked: which languages are the most “efficient” at conveying information? Or, which languages can convey the same information in the smallest amount of space? Untethered by the idiosyncrasies of Twitter, this question becomes quite difficult to answer. What do you mean by “space”? Number of characters? Number of bytes? Number of syllables? Each of these has its own problems. And perhaps more crucially, what do you mean by “information”? The Shannon notion of information does not straightforwardly apply here.

A group of us at Google set out to answer this question, or at least to provide the form that an answer would have to take. We had the resources and experience needed to annotate data in multiple languages, and we were able to divert some of those resources to this task. The results were published in a paper presented at the 2014 International Conference on Language Resources and Evaluation in Reykjavík, Iceland.

We are now releasing the data on GitHub. The data consist of 85 sentences typical of the kinds of sentences generated by Google Now, translated into eight typologically diverse languages: English, French, Italian, German, Russian, Arabic, Korean, Chinese, which include some highly inflected and uninflected languages, and various types of morphology including inflectional and agglutinative. The data were annotated by one to three annotators depending on the language, with morphological information, counts of the marked features and other information. The main data file is in HTML, color coded by language, which makes it easy to browse but also easy to extract into other formats.

Since the basic information conveyed by each sentence can be assumed to be the same across languages, the main focus of the research was on the additional information that each language marks, and cannot avoid marking. For example, the English sentence:

Use my location for the search results and other services.
has the French translation:

Utilisez ma position pour les résultats de recherche et d'autres services.
The verb ending -ez, in boldface above marks “addressee respect”, a bit of information that is missing from the English original.  One could have used a different ending on the French verb, but then that would not avoid this bit of information—it would be choosing to mark lack of respect, or familiarity with the addressee.

In the paper we tried various ways of measuring the differing information content of the languages relative to various definitions of “space”. Considering all the factors together, we concluded that the languages that conveyed the most information in a given amount of space were highly inflected languages like Russian, with uninflected languages like Chinese actually being the “least efficient” at conveying information.

We don’t expect this to be the final answer, which is why we are releasing the data as open source in the hopes that others will find it useful and maybe can even extend it to more sentences or a wider variety of languages. Ultimately though, any answer to the question of which languages convey the most information in the smallest amount of space must seriously address what is meant by “information”, and must pay heed to the famous maxim by the Russian linguist Roman Jakobson (1959) that “languages differ essentially in what they must convey and not in what they may convey.”

By Richard Sproat, Research Scientist
Categories: Open Source

Making Rubyists more comfortable on Google Cloud Platform

Fri, 08/05/2016 - 18:00
One of the many open source efforts at Google is the Google Cloud Platform (GCP) native libraries for our most popular languages. One of these libraries is the gcloud-ruby project on GitHub which is released as the gcloud gem on rubygems.org. There are several gems for accessing Google Cloud Platform resources from Ruby but this gem is different. It is hand coded by Rubyists for Rubyists and that has some distinct advantages.

Many of us have had experience working with libraries that are clearly ported from another language. I usually talk about them as Ruby with a Java accent or Python with a Perl accent. Generally they work just fine but you can run into some low level friction — sometimes things just don’t feel right. Native gems written by members of the community solve this problem. In the case of gcloud-ruby there are some really concrete examples.

First, gcloud-ruby uses syntax that is similar to other popular Ruby libraries. For example, the syntax for specifying a table schema in BigQuery (Google Cloud Platform's very large scale data warehouse) looks like this:

table = dataset.create_table "baby_names" do |schema|
schema.string "name"
schema.string "sex"
schema.integer "number"
end

Creating the same table in popular Ruby on Rails looks like this:

create_table "baby_names" do |schema|
schema.string "name"
schema.string "sex"
schema.integer "number
end

The two are nearly identical. That makes getting up to speed on BigQuery easier and quicker than it would be if the Ruby library didn't use patterns that are already known to the majority of Rubyists. 
Another way the gcloud-ruby library meets the community where it is at is by embracing the community's fondness for doing things several different ways. In Ruby there are often several correct ways to do a given task.
The gcloud-ruby library is no exception. There are a few different ways to authenticate and create the objects you use to interact with the API. Ruby also has many common methods that have aliases. In the standard library Enumerable#map and Enumerable#collect actually run the same code path for example. In gcloud-ruby the vision API uses aliases. Google Cloud Vision provides a single endpoint: annotate. gcloud-ruby has an annotate method but also aliases this method as mark and detect if those make more sense to you (detect is the method that makes the most sense to my brain so that's the one I use). By providing a couple of different aliases it can mean the first thing you try is more likely to work. This speeds up development time and makes learning the library easier. 
The last way the gcloud-ruby gem makes Rubyists feel at home is by having comprehensive tests, a common value and popular discussion topic for the Ruby community. gcloud-ruby uses minitest-spec for testing, a popular choice that most Rubyists can easily read. When I was learning the storage API I looked at the tests for storage to learn how to use the library. There is outstanding documentation as well for those who prefer learning that way but I'm so used to looking at tests that I really appreciated that gcloud-ruby has well written and easily accessible tests.
Above are three examples of how hand-coded libraries from within the community can improve the user experience when learning to use tools. Of course, doing all the development on GitHub in the open also helps. Users can easily see what bugs people have run into and what features are next up in the production queue. And if a user has a feature request (like the previously mentioned Cloud Vision support) they can create a GitHub issue.
If you’re a Rubyist, give gcloud-ruby a shot and let us know what you think!
By Aja Hammerly, Developer Advocate
Categories: Open Source

Stories from Google Code-in: KDE, MetaBrainz and Haiku

Mon, 08/01/2016 - 18:00
Google Code-in is our annual contest that gives students age 13 to 17 experience in computer science through contributions to open source projects. This blog post is the second installment in our series reflecting on the experiences of Google Code-in 2015 grand prize winners. Be sure to check out the first post in the series.

This week we profile three more grand prize winners from Google Code-in 2015. These students came from all around the world to celebrate with us in June after successfully completing 692 tasks that resulted in significant contributions to the participating open source projects.

Google Code-in 2015 Grand Prize Winners and Mentors were treated to a cruise around San Francisco Bay.
Students were paired with mentors who guided them as they learned both new technologies and how to collaborate on real-world projects. While most students had some programming experience, many were new to open source. In the end, they learned new skills, connected with open source communities and many will continue to contribute to open source projects.
We’re proud of all of the participants and grateful to the mentors who helped them. We invited the contest winners to write about their experience and many took us up on the offer. Here are their stories:
First up today is Imran Tatriev, a student from Kazakhstan who decided to work on the KDE project because loved their philosophy and had experience with C++ and Qt. He was a finalist in Google Code-in 2014 when he worked with the OpenMRS project.
Imran’s work on KDE included contributing to projects such as KDevelop, Marble and GCompris. His biggest challenge was working on the KDevelop IDE’s debugger where he was tasked with highlighting crashed threads. Highlighting the crashed thread was trivial, finding the thread that had crashed was not. It took him five days to solve that problem and he credits his mentor with helping him to work through it.
In the end, Imran learned a lot about regular expressions, the architecture of large software projects, C++ and unit testing. What did he like most about his Google Code-in experience? Imran writes: “The most valuable moments were meeting wonderful and smart people.” He plans to continue working with KDE and apply for Google Summer of Code.
Next is Caroline Gschwend, a student from the US who worked on the MetaBrainz project. Both of her parents are computer scientists and she credits them with spurring her interest.
A homeschool student with a unique approach to education, Caroline loves to learn and voraciously consumes free online resources. She had this to say: “I think that free, online learning is an amazing benefit to our society. With access to a computer and the internet, anyone, anywhere, can learn anything.”
Caroline discovered Google Code-in through her mother who had, in turn, discovered the contest through Google for Education. Caroline dug in and decided it was right up her alley. She loved that it embraced beginners with open arms and introduced new people to open source. Ultimately, she decided to work with MetaBrainz because, as a classically trained violinist, MusicBrainz piqued her interest. Their projects are primarily written in Perl and Python and, while Caroline was fluent in Java, it was too interesting to pass up.
As with most students, Caroline found collaboration to be a big part of the learning curve -- from GitHub to Git and IRC. Her mentors and other community contributors on IRC helped Caroline through the process and, looking back, she found that collaboration to be her favorite part of the whole experience. She loved that the mentors helped her to produce professional quality work rather than focusing on quantity.
Google Code-in gave Caroline a chance to learn about collaboration, Inkspace, icon design, web development and more. She has continued her work in open source and plans to apply for Google Summer of Code.
The last student we’re highlighting today is Vale Tolpegin, a student from the US who worked on the Haiku project, an open source operating system for personal computers. He also participated in Google Code-in 2014 but didn’t feel his skills were sharp enough to attack the more challenging tasks, like the ones he tackled this time around for Haiku.
Vale took on a wide range of tasks from documentation to application development, his favorite being the creation of the Haiku Hardware Repository. The repository is a Django website that lets people search and share hardware tests to determine if a given machine will work with Haiku.
He ran into a sticky issue early on, spending nearly a week finding a race condition within an application maintained by Haiku. Vale found it frustrating, but his mentors helped him see it through to the end. That wasn’t the only big challenge he ran into and, ultimately, bested: he spent another week debugging a Remote Desktop Application, software which had a very large code base.
Despite the two time consuming challenges, Vale managed to accomplish a lot more during the contest, including building a graph plotter and fixing bugs in the Haiku package manager. Vale had this to say:
“After finishing GCI, I have continued to work with Haiku and the experiences I have gained will continue to have an impact on me for years to come. Participating in GCI has truly been a life-­changing experience!”
Thank you to Imran, Caroline and Vale for their contributions to open source and for sharing their Google Code-in experiences with us. Stay tuned, we’ve got two more posts coming in this series!
By Josh Simmons, Open Source Programs Office
Categories: Open Source

Stories from Google Code-in: FOSSASIA and Haiku

Fri, 07/29/2016 - 21:43
Google Code-in is our annual contest to help pre-university students gain real-world computer science experience by taking on tasks of varying difficulty levels with the help of volunteer mentors. These tasks are created by open source projects so while learning, the students are contributing to the software many of us use on a daily basis.

The finalists and winners for our 2015/2016 season were announced in February and, in June, the grand prize winners joined us for four days of learning and celebration. Students and their guardians came from all around the world. One of my favorite things, as one of the Googler hosts, was seeing the light bulbs go on above parents’ heads as they came to understand open source and why it’s so important. These parents and guardians were even more proud of the students as they learned how much their teenager has contributed to the world through participating in Google Code-in.

We’ve invited contest winners and organizations to write about their experience and will be sharing their stories in a series of blog posts. This marks the first post in the series.

Google Code-in 2015 Grand Prize Winners and Mentors
Let’s start with Jason Wong, a student from the US who worked with FOSSASIA. FOSSASIA supports open source developers in Asia through events and coding programs.
Jason got into computer science during middle school at a summer camp where he built a website describing the differences between Linux, OS X, and Windows.  He dove deeper into web development by learning PHP and JavaScript through YouTube videos. He enjoyed being able to build more complex and dynamic websites. Like many new developers, Jason became very confident but did not concern himself with important aspects of programming like testing.
He learned about Google Code-in when Stephanie Taylor, fellow open source program manager who manages the GCI program here at Google, gave a talk at his school. Jason dove right in picking FOSSASIA as the project he would contribute to.
FOSSASIA offered Jason a chance to learn a lot about development and open source. He worked on their event pages, integrated Loklak and added an RSS section to their website, gaining experience with version control, Docker, Pharo and Node.js in the process. Most importantly, Jason learned about collaboration. He had this to say:
“Collaboration is so important in the open source community as it allows everyone to come together to help the world. Google Code-in has persuaded me to contribute to open source in the future.”
Next up we have Hannah Pan, another US student. She chose to work on Haiku, an open source operating system built for personal computers, because it used the C/C++ language which she was already confident with.
Hannah got into computer science through a high school AP course and discovered Google Code-in through this blog (woohoo!). She decided to participate even though it had already been underway for two weeks. Aiming just to make the top 10 in order to have a chance at being a finalist (and earn a hoodie), Hannah finished as a grand prize winner! 
The learning curve was steep: *nix commands, build tools and GitHub all presented new challenges. She was surprised how much code she had to sift through sometimes just to isolate the cause of minor bugs.
Like all of the participants, Hannah found her mentors to be crucial in providing both technical guidance and moral support. She explained, “I was amazed at my mentors’ expertise, dedication, modesty, and high standards. They taught me to strive for excellence rather than settle for mediocrity.”
Among other things, Hannah added localization support to the Tipster app, fixed extractDebugInfo, and even wrote a how-to article relating to the work. Reflecting on her experience, Hannah wrote:
“On the technical side, not only have I learned a lot, but I have realized how much more I have yet to learn. In addition, it has taught me some important life skills that no doubt will benefit me in my future endeavors. I’d like to thank my mentors and other students who inspired me and pushed me to do my best.”
Thank you to Jason and Hannah both for contributing to open source and sharing their Google Code-in experiences with us. Stay tuned as we continue this series in our next blog post!
By Josh Simmons, Open Source Programs Office
Categories: Open Source

Omnitone: Spatial audio on the web

Mon, 07/25/2016 - 18:04

Spatial audio is a key element for an immersive virtual reality (VR) experience. By bringing spatial audio to the web, the browser can be transformed into a complete VR media player with incredible reach and engagement. That’s why the Chrome WebAudio team has created and is releasing the Omnitone project, an open source spatial audio renderer with the cross-browser support.

Our challenge was to introduce the audio spatialization technique called ambisonics so the user can hear the full-sphere surround sound on the browser. In order to achieve this, we implemented the ambisonic decoding with binaural rendering using web technology. There are several paths for introducing a new feature into the web platform, but we chose to use only the Web Audio API. In doing so, we can reach a larger audience with this cross-browser technology, and we can also avoid the lengthy standardization process for introducing a new Web Audio component. This is possible because the Web Audio API provides all the necessary building blocks for this audio spatialization technique.



Omnitone Audio Processing Diagram
The AmbiX format recording, which is the target of the Omnitone decoder, contains 4 channels of audio that are encoded using ambisonics, which can then be decoded into an arbitrary speaker setup. Instead of the actual speaker array, Omnitone uses 8 virtual speakers based on an the head-related transfer function (HRTF) convolution to render the final audio stream binaurally. This binaurally-rendered audio can convey a sense of space when it is heard through headphones.

The beauty of this mechanism lies in the sound-field rotation applied to the incoming spatial audio stream. The orientation sensor of a VR headset or a smartphone can be linked to Omnitone’s decoder to seamlessly rotate the entire sound field. The rest of the spatialization process will be handled automatically by Omnitone. A live demo can be found at the project landing page.

Throughout the project, we worked closely with the Google VR team for their VR audio expertise. Not only was their knowledge on the spatial audio a tremendous help for the project, but the collaboration also ensured identical audio spatialization across all of Google’s VR applications - both on the web and Android (e.g. Google VR SDK, YouTube Android app). The Spatial Media Specification and HRTF sets are great examples of the Google VR team’s efforts, and Omnitone is built on top of this specification and HRTF sets.

With emerging web-based VR projects like WebVR, Omnitone’s audio spatialization can play a critical role in a more immersive VR experience on the web. Web-based VR applications will also benefit from high-quality streaming spatial audio, as the Chrome Media team has recently added FOA compression to the open source audio codec Opus. More exciting things like VR view integration, higher-order ambisonics and mobile web support will also be coming soon to Omnitone.

We look forward to seeing what people do with Omnitone now that it's open source. Feel free to reach out to us or leave a comment with your thoughts and feedback on the issue tracker on GitHub.

By Hongchan Choi and Raymond Toy, Chrome Team

Due to the incomplete implementation of multichannel audio decoding on various browsers, Omnitone does not support mobile web at the time of writing.
Categories: Open Source

Kubernetes 1.3 is here!

Thu, 07/21/2016 - 18:00
With all of the excitement being generated around the Kubernetes 1.3 release and the first anniversary of Kubernetes 1.0 (#k8sbday), now is a great time to point out some of the features that enterprise users should be taking note of.

If you’re not familiar with Kubernetes, let me get you up to speed.

Kubernetes is an open-source container automation framework that builds upon 15 years of experience of running production workloads at Google. Once you declare a desired state, Kubernetes works to drive your system toward that state. As a developer this means less time handling trivial tasks that a computer can automate and more time focusing on developing applications that provide value to users.

Additionally, Kubernetes aims to be a framework that you can operate at planetary scale, run anywhere, and never outgrow.

With the release of Kubernetes 1.3, Kubernetes is closer than ever to meeting those goals; the 1.3 release adds exciting features such as:Aside from features, the coolest part about working with Kubernetes is hearing user stories. I’ll soon be publishing an interview with Joseph Jacks, co-founder of Kismatic, the enterprise Kubernetes company, on the Kubernetes blog.
Joseph is very active in the Kubernetes community and has extensive experience with Kubernetes in production. In the interview I ask him why he bet his business on Kubernetes, what could be better, and how he sees Kubernetes growing in the near future.
Kubernetes has many, many features to offer that I didn’t get to cover in this short write-up. If you know anyone that needs to ramp up on Kubernetes, the easiest way is the free course I created with Kelsey Hightower, Scalable Microservices with Kubernetes. The course covers the basic features of Kubernetes. If you want an overview of what’s new in Kubernetes 1.3, feel free to look at the “What’s new in Kubernetes 1.3” video or slides.
Finally for a more in-depth look at the 1.3 release, make sure to check out: 5 days of Kubernetes 1.3 blog series.
Want to learn more about container orchestration and cloud native platforms? Here’s some recommended reading to follow up with:By Carter Morgan, Developer Programs Engineer
Categories: Open Source

Announcing an Open Source ADC board for BeagleBone

Wed, 07/20/2016 - 18:00
Cross posted on the Google Research Blog
Working with electronics, we often find ourselves soldering up a half baked electronic circuit to detect some sort of signal. For example, last year we wanted to measure the strength of a carrier. We started with traditional analog circuits — amplifier, filter, envelope detector, threshold. You can see some of our prototypes in the image below; they get pretty messy.

While there's a certain satisfaction in taming a signal using the physical properties of capacitors, coils of wire and transistors, it's usually easier to digitize the signal with an Analog to Digital Converter (ADC) and manage it with Digital Signal Processing (DSP) instead of electronic parts. Tweaking software doesn't require a soldering iron, and lets us modify signals in ways that would require impossible analog circuits.

There are several standard solutions for digitizing a signal: connect a laptop to an oscilloscope or Data Acquisition System (DAQ) via USB or Ethernet, or use the onboard ADCs of a maker board like an Arduino. The former are sensitive and accurate, but also big and power hungry. The latter are cheap and tiny, but slower and have enough RAM for only milliseconds worth of high speed sample data.  

That led us to investigate single board computers like the BeagleBone and Raspberry Pi, which are small and cheap like an Arduino, but have specs like a smartphone.  And crucially, the BeagleBone's system-on-a-chip (SoC) combines a beefy ARMv7 CPU with two smaller Programmable Realtime Units (PRUs) that have access to all 512MB of system RAM.  This lets us dedicate the PRUs to the time-sensitive and repetitive task of reading each sample out of an external ADC, while the main CPU lets us use the data with the GNU/Linux tools we're used to.

The result is an open source BeagleBone cape we've named PRUDAQ.  It's built around the Analog Devices AD9201 ADC, which samples two inputs simultaneously at up to 20 megasamples per second, per channel.  Simultaneous sampling and high sample rates make it useful for software-defined radio (SDR) and scientific applications where a built-in ADC isn't quite up to the task.  

Our open source electrical design and sample code are available on GitHub, and GroupGets has boards ready to ship for $79.  We also were fortunate to have help from Google intern Kumar Abhishek. He added support for PRUDAQ to his Google Summer of Code project BeagleLogic that performs much better than our sample code.

We started PRUDAQ for our own needs, but quickly realized that others might also find it useful. We're excited to get your feedback through the email list.  Tell us what can be done with inexpensive fast ADCs paired with inexpensive fast CPUs!
Posted by Jason Holt, Software Engineer
Categories: Open Source

Lessons from Professors' Open Source Software Experience (POSSE) 2016

Wed, 07/06/2016 - 17:06

From Google Summer of Code to Google Code-in, the Open Source Programs Office does a lot to get students involved with open source. In order to learn more about supporting open source in academia, I attended the NSF funded Professors' Open Source Software Experience (POSSE) in Philadelphia. It was a great opportunity for us to better understand the challenges instructors face in weaving open source into their curriculum and hear solutions on how to bridge the gap.

Almost 30 university professors and community college lecturers attended the 3-day workshop. During the workshop, attendees worked in small groups getting hands on experience incorporating humanitarian free and open source software (HFOSS) into their teaching. Professors were able to talk, mingle and share best practices throughout the event.

The POSSE workshop is led by Heidi Ellis, Professor, Department of Computer Science and Information Technology at Western New England University, and Greg Hislop, Professor of Software Engineering and Senior Associate Dean for Academic Affairs at Drexel University. Heidi and Greg took over running POSSE five years after Red Hat began the program as an outreach effort to the higher education community. Red Hat continues as a collaborator in the effort. Around 40 university and community college professors participate in the program every year with over 100 individuals attending the workshop in the last four years.

Here are some of the challenges professors shared:
  • Very little guidance on how to bring FOSS into the classroom. No standard curriculum / syllabus available to reference. 
  • Time investment required to change the curriculum.
  • Will not be rewarded for teaching FOSS courses.
  • Will not get funds to travel for workshops/conferences unless it’s to present a paper at a conference.
  • Many administrations aren’t aware that adding open source is beneficial for students since more and more companies use open source and expect their new hires to be familiar with it.

The next POSSE will be Nov 17-19. Faculty who are interested in attending POSSE, please click here to apply.
We also discussed a number of open source programs that are currently working to engage students with open source software development:

Thanks to Heidi, Greg and the FOSS2Serve team for organizing POSSE 2016! We look forward to taking what we’ve learned and using it to better support FOSS education in academia.

By Feiran Helen Hu, Open Source Programs Office

Categories: Open Source

GitHub on BigQuery: Analyze all the code

Wed, 06/29/2016 - 22:35
Posted by Felipe Hoffa, Google Developer Advocate

Google, in collaboration with GitHub, is releasing an incredible new open dataset on Google BigQuery. So far you've been able to monitor and analyze GitHub's pulse since 2011 (thanks GitHub Archive project!) and today we're adding the perfect complement to this. What could you do if you had access to analyze all the open source software in the world, with just one SQL command?

The Google BigQuery Public Datasets program now offers a full snapshot of the content of more than 2.8 million open source GitHub repositories in BigQuery. Thanks to our new collaboration with GitHub, you'll have access to analyze the source code of almost 2 billion files with a simple (or complex) SQL query. This will open the doors to all kinds of new insights and advances that we're just beginning to envision.

For example, let's say you're the author of a popular open source library. Now you'll be able to find every open source project on GitHub that's using it. Even more, you'll be able to guide the future of your project by analyzing how it's being used, and improve your APIs based on what your users are actually doing with it.

On the security side, we've seen how the most popular open source projects benefit from having multiple eyes and hands working on them. This visibility helps projects get hardened and buggy code cleaned up. What if you could search for errors with similar patterns in every other open source project? Would you notify their authors and send them pull requests? Well, now you can. Some concepts to keep in mind while working with BigQuery and the GitHub contents dataset:
To learn more, read GitHub's announcement and try some sample queries. Share your queries and findings in our reddit.com/r/bigquery and Hacker News posts. The ideas are endless, and I'll start collecting tips and links to other articles on this post on Medium.

Stay curious!
Categories: Open Source

More statistics from Google Summer of Code 2016

Wed, 06/29/2016 - 17:41
Google Summer of CodeGoogle Summer of Code (GSoC) 2016 is officially at its halfway point. Mentors and students have just completed their midterm evaluations and it’s time for our second stats post. This time we take a closer look at our participating students.

First, we’d like to highlight the universities with the most student participants. Congratulations are due to the International Institute of Information Technology - Hyderabad for claiming the top spot for the third consecutive year!

Country School 2016 Accepted Students 2015 Accepted Students 12 Year Total India International Institute of Information Technology - Hyderabad 50 62 252 Sri Lanka University of Moratuwa 29 44 320 Romania University POLITEHNICA of Bucharest 24 14 155 India Birla Institute of Technology and Science Pilani, Goa Campus 22 15 110 India Birla Institute of Technology and Science, Pilani Campus 22 18 116 India Indian Institute of Technology, Bombay 18 13 75 India Indian Institute of Technology, Kharagpur 15 8 92 India Indian Institute of Technology, Roorkee 15 8 57 India Indraprastha Institute of Information Technology Delhi 15 7 27 India Amrita School of Engineering, Amrita University, Amritapuri Campus 13 5 33 India Indian Institute of Technology, Guwahati 13 5 38 Cameroon University of Buea 12 10 26 India Delhi Technological University 12 9 60 India Indian Institute of Technology BHU Varanasi 12 12 37 Germany TU Munich 11 7 45

Next, we are proud to announce that 2016 marks the largest number of female GSoC participants to date — 12% of accepted students are female, up 2.2% from 2015. This is good progress, but we are certain we can do better in the future to diversify our program. The Google Open Source team will continue our outreach to many organizations, for example, Grace Hopper and Black Girls Code, to increase this number even more 2017. If you have any suggestions of organizations we should work with, please let us know in the comments.

Finally, each year we like to look at the majors of students. As expected, the most common area of study for our participants is Computer Science (approximately 78%), but this year we have a wide variety of studies including Linguistics, Law, Music Technology and Psychology.  The majority of our students this year are undergraduates (67%), followed by Masters (23%) and then PhD students (9%).



Although reviewing GSoC statistics each year is great fun, we want to stress that being “first place” is not the point of the program. Our goal is to get more and more students involved in creating free and open source software. We hope Google Summer of Code encourages contributions to projects that have the potential to make a difference worldwide. Congratulations to the students from all over the globe and keep up the good work!

By Mary Radomile, Open Source Programs Office
Categories: Open Source

Google Summer of Code 2016 statistics: Part one

Tue, 05/24/2016 - 21:23
Google Summer of CodeWe share statistics from Google Summer of Code (GSoC) every year — now that 2016 is chugging along we’ve got some exciting numbers to share! 1,206 students from all over the globe are currently in the community bonding period, a time where participants learn more about the organization they will be contributing to before coding officially begins on May 23. This includes becoming familiar with the community practices and processes, setting up a development environment, or contributing small (or large) patches and bug fixes.

We’ll start our statistics reporting this year with the total number of students participating from each country:

Country Accepted Students Country Accepted Students Country Accepted Students Albania 1 Greece 10 Romania 31 Algeria 1 Guatemala 1 Russian Federation 52 Argentina 3 Hong Kong 2 Serbia 2 Armenia 3 Hungary 7 Singapore 7 Australia 6 India 454 Slovak Republic 3 Austria 19 Ireland 3 Slovenia 4 Belarus 5 Israel 2 South Africa 2 Belgium 5 Italy 23 South Korea 6 Bosnia-Herzegovina 1 Japan 12 Spain 33 Brazil 21 Kazakhstan 2 Sri Lanka 54 Bulgaria 2 Kenya 3 Sweden 5 Cambodia 1 Latvia 3 Switzerland 2 Cameroon 16 Lithuania 1 Taiwan 7 Canada 23 Luxembourg 1 Thailand 1 China 34 Macedonia 1 Turkey 12 Croatia 2 Mexico 2 Ukraine 13 Czech Republic 6 Netherlands 9 United Kingdom 18 Denmark 2 New Zealand 2 United States 118 Egypt 10 Pakistan 4 Uruguay 1 Estonia 1 Paraguay 1 Venezuela 1 Finland 3 Philippines 2 Vietnam 4 France 19 Poland 28     Germany 66 Portugal 7    

We’d like to welcome a new country to the GSoC family. 2016 brings us one student from Albania!

In our upcoming statistics posts, we will delve deeper into the numbers by looking at  universities with the most accepted students, gender numbers, mentor countries and more. If you have additional statistics that you would like us to share, please leave a comment below and we will consider including them in an upcoming post.

By Mary Radomile, Open Source Programs

Correction: A previous version of this blog post erroneously reported the total number of students as 1,202 and the number of students from Cameroon as 1. This has been updated to reflect the actual totals as 1,206 and 16 respectively.
Categories: Open Source

Coding has begun for Google Summer of Code 2016

Mon, 05/23/2016 - 22:23
2016 Google Summer of Code

Today marks the start of coding for the 12th annual Google Summer of Code. With the community bonding period complete, about 1,200 students now begin 12 weeks of writing code for 178 different open source organizations.

We are excited to see the contributions this year’s students will make to the open source community. 

For more information on important dates for the program please visit our timeline. Stay tuned as we will highlight some of the new mentoring organizations over the next few months.

Have a great summer and happy coding!

By Josh Simmons, Open Source Programs Office
Categories: Open Source

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source

Fri, 05/13/2016 - 20:08
Originally posted on the Google Research Blog

By Slav Petrov, Senior Staff Research Scientist

At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways. Today, we are excited to share the fruits of our research with the broader community by releasing SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you and that you can use to analyze English text.

Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence. Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU.

How does SyntaxNet work?

SyntaxNet is a framework for what’s known in academic circles as a syntactic parser, which is a key first component in many NLU systems. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question. To take a very simple example, consider the following dependency tree for Alice saw Bob:


This structure encodes that Alice and Bob are nouns and saw is a verb. The main verb saw is the root of the sentence and Alice is the subject (nsubj) of saw, while Bob is its direct object (dobj). As expected, Parsey McParseface analyzes this sentence correctly, but also understands the following more complex example:


This structure again encodes the fact that Alice and Bob are the subject and object respectively of saw, in addition that Alice is modified by a relative clause with the verb reading, that saw is modified by the temporal modifier yesterday, and so on. The grammatical relationships encoded in dependency structures allow us to easily recover the answers to various questions, for example whom did Alice see?, who saw Bob?, what had Alice been reading about? or when did Alice see Bob?.

Why is Parsing So Hard For Computers to Get Right?

One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures. A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context. As a very simple example, the sentence Alice drove down the street in her car has at least two possible dependency parses:


The first corresponds to the (correct) interpretation where Alice is driving in her car; the second corresponds to the (absurd, but possible) interpretation where the street is located in her car. The ambiguity arises because the preposition in can either modify drove or street; this example is an instance of what is called prepositional phrase attachment ambiguity.

Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence. Usually the vast majority of these structures are wildly implausible, but are nevertheless possible and must be somehow discarded by a parser.

SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered. At each point in processing many decisions may be possible—due to ambiguity—and a neural network gives scores for competing decisions based on their plausibility. For this reason, it is very important to use beam search in the model. Instead of simply taking the first-best decision at each point, multiple partial hypotheses are kept at each step, with hypotheses only being discarded when there are several other higher-ranked hypotheses under consideration. An example of a left-to-right sequence of decisions that produces a simple parse is shown below for the sentence I booked a ticket to Google.
Furthermore, as described in our paper, it is critical to tightly integrate learning and search in order to achieve the highest prediction accuracy. Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework at Google. Given some data from the Google supported Universal Treebanks project, you can train a parsing model on your own machine.

So How Accurate is Parsey McParseface?

On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach. While there are no explicit studies in the literature about human performance, we know from our in-house annotation projects that linguists trained for this task agree in 96-97% of the cases. This suggests that we are approaching human performance—but only on well-formed text. Sentences drawn from the web are a lot harder to analyze, as we learned from the Google WebTreebank (released in 2011). Parsey McParseface achieves just over 90% of parse accuracy on this dataset.

While the accuracy is not perfect, it’s certainly high enough to be useful in many applications. The major source of errors at this point are examples such as the prepositional phrase attachment ambiguity described above, which require real world knowledge (e.g. that a street is not likely to be located in a car) and deep contextual reasoning. Machine learning (and in particular, neural networks) have made significant progress in resolving these ambiguities. But our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts.

To get started, see the SyntaxNet code and download the Parsey McParseface parser model. Happy parsing from the main developers, Chris Alberti, David Weiss, Daniel Andor, Michael Collins & Slav Petrov.
Categories: Open Source

Googlers on the road: OSCON 2016 in Austin

Mon, 05/09/2016 - 18:17
Developers and open source enthusiasts converge on Austin, Texas in just under two weeks for O’Reilly Media’s annual open source conference, OSCON, and the Community Leadership Summit (CLS) that precedes it. CLS runs May 14-15 at the Austin Convention Center followed by OSCON from May 16-19.

OSCON 2014 program chairs including Googler Sarah Novotny.
Photo licensed by O'Reilly Media under CC-BY-NC 2.0.
This year we have 10 Googlers hosting sessions covering topics including web development, machine learning, devops, astronomy and open source. A list of all of the talks hosted by Googlers alongside related events can be found below.
If you’re a student, educator, mentor, past or present participant in Google Summer of Code or Google Code-in, or just interested in learning more about the two programs, make sure to join us Monday evening for our Birds of a Feather session.

Have questions about Kubernetes, Google Summer of Code, open source at Google or just want to meet some Googlers? Stop by booth #307 in the Expo Hall.


Thursday, May 12th - GDG Austin7:00pm   Google Developers Group Austin Meetup


Sunday, May 15th - Community Leadership Summit10:00am  Occupational Hazard by Josh Simmons


Monday, May 16th9:00am   Kubernetes: From scratch to production in 2 days by Brian Dorsey and Jeff Mendoza7:00pm   Google Summer of Code and Google Code-in Birds of a Feather


Tuesday, May 17th9:00am   Kubernetes: From scratch to production in 2 days by Brian Dorsey and Jeff Mendoza9:00am   Diving into machine learning through TensorFlow by Julia Ferraioli, Amy Unruh and Eli Bixby


Wednesday, May 18th1:50pm    Open source lessons from the TODO Group by Chris DiBona, Chris Aniszczyk, Nithya Ruff, Jeff McAffer and Benjamin VanEvery5:10pm    Scalable bidirectional communication over the Web by Wenbo Zhu


Thursday, May 19th
11:00am  Kubernetes hackathon at OSCON Contribute hosted by Brian Dorsey, Nikhil Jindal, Janet Kuo, Jeff Mendoza, John Mulhausen, Sarah Novotny, Terrence Ryan and Chao Xu2:40pm    Blocks in containers: Lessons learned from containerizing Minecraft by Julia Ferraioli5:10pm    PANOPTES: Open source planet discovery by Jennifer Tong and Wilfred Gee5:10pm    Stop writing JavaScript frameworks by Joseph Gregorio


Haven’t registered for OSCON yet? You can knock 25% off the cost of registration by using discount code Google25, or attend parts of the event including our Birds of a Feather session for free by using discount code OSCON16XPO.

See you at OSCON!
By Josh Simmons, Open Source Programs Office
Categories: Open Source