Skip to content

Software Development News: .NET, Java, PHP, Ruby, Agile, Databases, SOA, JavaScript, Open Source

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!

Open Source

From Google Summer of Code to Game of Thrones on the Back of a JavaScript Dragon (Part 2)

Google Open Source Blog - Wed, 08/17/2016 - 20:08
This guest post is a part of a short series about Guy Yachdav, Tatyana Goldberg and Christian Dallago and the journey that was inspired by their participation as Google Summer of Code mentors for the BioJS project. Don’t miss the first post in the series. Heads up, this post contains spoilers for Game of Thrones seasons 5 and 6!

We built on the Google Summer of Code (GSoC) philosophy and the lessons we learned from participating in 2014 by starting a JavaScript Technology class at the Technical University of Munich (TUM).
We began with two dozen students who worked on expanding the BioJS visualization library. Our class became popular quickly and the number of applicants doubled each semester (nearly 180 applicants for 40 seats in the 2016 summer term).
In 2016 our team grew to include Christian Dallago, who had joined as a GSoC mentor. Together we decided to break with tradition of our course’s previous semesters. Instead of focusing on data visualization, we wanted to introduce students to data science with JavaScript. To get our students fully engaged, we decided the project would center on data from the hit TV show, Game of Thrones.
Our aim was to create an online portal for Game of Thrones fans which would:
  1. Provide the most comprehensive, structured and open data set about the Game of Thrones world accessible via API.
  2. Present an interactive map based on JavaScript.
  3. Listen to what people are saying on Twitter about each of the show’s characters.
  4. Use machine learning algorithms to predict the likelihood of each character’s death.
Our plan worked — the students were engaged. It was a beautiful sight to see: GitHub repos humming with activity as each dev team delved deeper into their projects. As a project manager, you know you’ve got something good when issues are being opened and closed at 4:00 AM!
The results were mind blowing. In 50 days of programming, 36 students opened over 1,200 issues and pull requests, pushed 3,300 commits, released four apps to NPM, and, of course, produced one absolutely amazing website.
The website amasses data from 2,028 characters. Our map shows 240 landmarks and the paths traveled by 28 characters. Our Twitter sentiment analysis tool analyzed over 3 million tweets. And we launched the first ever machine learning-based prediction algorithm that predicts the likelihood of dying for the 1,451 characters in the show that are still alive.
image02fix.pngVisualization of Twitter sentiment analysis data for Jon Snow during season 5 of Game of Thrones. The X axis shows the timeline and the Y axis shows the number of positive (green) and negative (red) tweets. Each tweet is analyzed by an algorithm using a neural network to determine whether the tweet’s writer has a positive, negative or neutral attitude toward the character. Since launch, the site’s popularity has skyrocketed. Following our press release, we were covered by over 1,500 media outlets, most notably Time, The GuardianRolling Stone, Daily Mail, BBC, Reuters, The Telegraph, CNET and many more. HowStuffWorks, The Vulture and others produced videos about the site and Chris Hardwick’s Comedy Central show did a segment about us. We've also given countless interviews to TV, radio and newspapers.
Blog2_Figure1_v3.pngGoogle Analytics for the website. Left chart shows the number of visitors to the website during the first week after launch, reaching over 73K visitors on April 25th. Right chart shows the number of visitors at a given time point during the same week.The most exciting part of the project was predicting the likelihood that any given character would die using machine learning. Machine learning algorithms find rules and patterns in the data, things that humans cannot obviously and simply detect. Once the rules and patterns are identified, we apply machine learning to make inferences or predictions from novel, previously unseen, data sets.
Warning: The next paragraphs contain spoilers for seasons 5 and 6 of Game of Thrones!
In order to predict the likelihood of a character’s death, we collected information about all of the characters that appeared in books 1 to 5 and analyzed over 30 features, including age, gender, marital status and others. Then we used a support vector machine (SVM) to statistically compare the features of characters, both dead and alive, to predict who would get the axe next. Our prediction was correct for 74% of all cases and surprised us by placing a number of characters thought to be relatively safe in grave danger.
According to our predictions, Jon Snow, who was seemingly betrayed and murdered by fellow members of the Night’s Watch at the end of season 5, had only an 11% chance of dying. Indeed, Jon has risen from the dead in the second episode of season 6! We also predicted that the rulers of Dorn (Doran and Trystane) Martell are at a high likelihood of death and, as predicted, they were taken out in the first episode of the new season.
Of course, as is always the case with predictions, there were also misses. We didn’t expect Roose Bolton to be killed off nor did we see Hodor’s departure coming.
This experience was an amazing ride for our team and it all started with Google Summer of Code! In the next post we’ll share what followed and where we see ourselves heading in the future.
By Guy Yachdav, Tatyana Goldberg and Christian Dallago, BioJS
Categories: Open Source

From Google Summer of Code to Game of Thrones on the Back of a JavaScript Dragon (Part 1)

Google Open Source Blog - Wed, 08/17/2016 - 20:08
This guest post is a part of a short series about Tatyana Goldberg and Guy Yachdav, instructors at Technical University of Munich, and the journey that was inspired by their participation as Google Summer of Code mentors for the BioJS project.

Hello there! We are from the BioJavaScript (BioJS) project which first joined Google Summer of Code (GSoC) in 2014. Our experience in the program set us on a grand open source adventure that we’ll be sharing with you in a series of blog posts. We hope you enjoy our story and, more importantly, hope it inspires you to pursue your own open source adventure.
Tatyana Goldberg and Guy Yachdav, GSoC mentors and open source enthusiasts. Photo taken at the MorpheusCup competition Luxembourg, May 2016.We came together around the BioJS community, an open source project for creating beautiful and interactive open source visualizations of biological data on the web. BioJS visualizations are made up of components which have a modular design. This modular design enables several things: they can be used by non-programmers, they can be combined to make more complex visualizations, and they can be easily integrated into existing web applications. Despite being a young community, BioJS already has traction in industry and academia.
In early 2014 we decided to apply for GSoC and we were fortunate to have our application accepted on our first try. The experience was extremely positive — the five students we accepted delivered great software and they had a big impact on the BioJS community:
  • The number of mailing list subscribers doubled in less than a month.
  • All five of our accepted students from 2014 became core developers.
  • Students were invited to six international conferences to share their work.
  • Students helped organize the first BioJS conference held July 2015.
  • Most importantly, the students have independently designed BioJS version 2.0 which positioned BioJS as the leading open source visualization library for biological data. 
You can see three examples of the work GSoC students did on BioJS below:

MSAViewer is a visualization and analysis of multiple sequence alignments and was developed by Sebastian Wilzbach. Proteome Viewer is a multilevel visualization of proteomes in the UniProt database and was developed by Jose Villaveces. Genetic Variation Viewer is visualization of the number and type of mutations at each position in a biological sequence and was developed by Saket Choudhary.
We learned a lot in the first year we participated in Google Summer of Code. Here are some of the takeaways that are especially relevant to mentors and organizations that are considering joining the program:
  1. GSoC is a great source of dedicated and enthusiastic young developers.
  2. Mentors need to carefully manage students, listen to them and let them lead initiatives when it makes sense.
  3. Org admins should leverage success in GSoC beyond the program.
  4. Orgs need to find the most motivated students and make sure their projects are feasible.
  5. People want to share in your success, so participation in GSoC can start a positive feedback loop attracting new contributors and users.
  6. Most importantly: the ideas behind GSoC - the love for open source and coding - are contagious and spread easily to larger audiences, especially to students and other people who work in academia. Just try it! 
Our positive experience spurred us to seek out and conquer new challenges. Stay tuned for our next post where we explain how GSoC inspired us to create a popular new class and how we applied data science to Game of Thrones.
By Tatyana Goldberg and Guy Yachdav, BioJS and TU Munich
Categories: Open Source

A Google Santa Tracker update from Santa's Elves

Google Open Source Blog - Wed, 08/17/2016 - 18:00

Originally posted on the Google Developers Blog

By Sam Thorogood, Developer Programs Engineer


Today, we're announcing that the open source version of Google's Santa Tracker has been updated with the Android and web experiences that ran in December 2015. We extended, enhanced and upgraded our code, and you can see how we used our developer products - including Firebase and Polymer - to build a fun, educational and engaging experience.


To get started, you can check out the code on GitHub at google/santa-tracker-weband google/santa-tracker-android. Both repositories include instructions so you can build your own version.
Santa Tracker isn’t just about watching Santa’s progress as he delivers presents on December 24. Visitors can also have fun with the winter-inspired experiences, games and educational content by exploring Santa's Village while Santa prepares for his big journey throughout the holidays.
Below is a summary of what we’ve released as open source.
Android app
  • The Santa Tracker Android app is a single APK, supporting all devices, such as phones, tablets and TVs, running Ice Cream Sandwich (4.0) and up. The source code for the app can be found here.
  • Santa Tracker leverages Firebase features, including Remote Config API, App Invites to invite your friends to play along, and Firebase Analytics to help our elves better understand users of the app.
  • Santa’s Village is a launcher for videos, games and the tracker that responds well to multiple devices such as phones and tablets. There's even an alternative launcher based on the Leanback user interface for Android TVs.


  • Games on Santa Tracker Android are built using many technologies such as JBox2D (gumball game), Android view hierarchy (memory match game) and OpenGL with special rendering engine (jetpack game). We've also included a holiday-themed variation of Pie Noon, a fun game that works on Android TV, your phone, and inside Google Cardboard's VR.
Android Wear

  • The custom watch faces on Android Wear provide a personalized touch. Having Santa or one of his friendly elves tell the time brings a smile to all. Building custom watch faces is a lot of fun but providing a performant, battery friendly watch face requires certain considerations. The watch face source code can be found here.
  • Santa Tracker uses notifications to let users know when Santa has started his journey. The notifications are further enhanced to provide a great experience on wearables using custom backgrounds and actions that deep link into the app.
On the web

  • Santa Tracker is mobile-first: this year's experience was built for the mobile web, including an amazing brand new, interactive - yet fully responsive, village: with three breakpoints, touch gesture support and support for the Web App Manifest.
  • To help us develop Santa at scale, we've upgraded to Polymer 1.0+. Santa Tracker's use of Polymer demonstrates how easy it is to package code into reusable components. Every housein Santa's Village is a custom element, only loaded when needed, minimizing the startup cost of Santa Tracker.


  • Many of the amazing new games (like Present Bounce) were built with the latest JavaScript standards (ES6) and are compiled to support older browsers via the Google Closure Compiler.
  • Santa Tracker's interactive and fun experience is enhanced using the Web Animations API, a standardized JavaScript APIfor unifying animated content.
  • We simplified the Chromecast support this year, focusing on a great screensaver that would countdown to the big event on December 24th - and occasionally autoplay some of the great video content from around Santa's Village.
We hope that this update inspires you to make your own magical experiences based on all the interesting and exciting components that came together to make Santa Tracker!
Categories: Open Source

Eclipse Newsletter - Eclipse Che: A New Eclipse IDE

Eclipse News - Wed, 08/17/2016 - 16:07
Read this month's newsletter to learn all about Eclipse Che!
Categories: Open Source

Eclipse Java Development Tools

Date Created: Wed, 2016-08-17 06:14Date Updated: Wed, 2016-08-17 09:15Submitted by: Mickael Istria

The Java Development Tools (JDT) project contributes a set of plug-ins that add the capabilities of a full-featured Java IDE to the Eclipse platform.

Categories: Open Source

Contributor Agreement Update

Eclipse News - Mon, 08/15/2016 - 17:50
Adjacent to the IP process overhaul, next week the Eclipse Foundation will be updating contributor agreements.
Categories: Open Source

Projects of the Week, August 15, 2016

SourceForge.net: Front page news - Mon, 08/15/2016 - 05:05

Here are the featured projects for the week, which appear on the front page of SourceForge.net:

ReactOS

ReactOS is an open source effort to develop a quality operating system that is compatible with applications and drivers written for the Microsoft Windows NT family of operating systems (NT4, 2000, XP, 2003).
[ Download ReactOS ]


Super Audio CD Decoder

Super Audio CD Decoder input plugin for foobar2000. Decoder is capable of playing back Super Audio CD ISO images, DSDIFF and DSF files. Direct DSD playback for compatible devices.
[ Download Super Audio CD Decoder ]


Seer

This is a quick look tool for Windows. (Linux will be supported in the future.) Acts just like the one in OS X , but Seer is more powerful and faster. Share with your friends please. Share with your friends please. Share with your friends please. Sorry I said it three times ᕕ(ᐛ)ᕗ Thanks. Minimum supported : Windows Vista.
[ Download Seer ]


Asuswrt-Merlin

Asuswrt-Merlin is a third party firmware for select Asus wireless routers. Based on the Asuswrt firmware developed by Asus, it brings tweaks, new features and other improvements to the original firmware, while retaining its performance and ease of use. Note that only downloads are hosted on SF.net – the complete source code can be found on https://github.com/RMerl/asuswrt-merlin .
[ Download Asuswrt-Merlin ]


berryboot

Berryboot is a simple operating system installer and boot selection screen for ARM devices such as the Raspberry Pi and Cubieboard. It allows you to put multiple Linux distribution on a single SD card.
[ Download berryboot ]


DisplayCAL

DisplayCAL (formerly known as dispcalGUI) is a graphical user interface for the display calibration and profiling tools of Argyll CMS, an open source color management system. Calibrate and characterize your display devices using one of the many supported measurement instruments, with support for multi-display setups and a variety of available settings like customizable whitepoint, luminance, tone response curve as well as the option to create accurate look-up-table ICC profiles as well as some proprietary 3D LUT formats. Check the accuracy of profiles and 3D LUTs via measurements.
[ Download DisplayCAL ]


CMU Sphinx

CMUSphinx is a speaker-independent large vocabulary continuous speech recognizer released under BSD style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition systems.
[ Download CMU Sphinx ]


Uniform Server

The Uniform Server is a lightweight server solution for running a web server under the WindowsOS. Less than 24MB! Modular design, includes the latest versions of Apache2, Perl5, PHP (switch between PHP53, PHP54, PHP55 or PHP56), MySQL5 or MariaDB5, phpMyAdmin or Adminer4. Run from either hard drive or USB memory stick… NO INSTALLATION REQUIRED! NO REGISTRY DUST! Just UNPACK and FIRE UP!
[ Download Uniform Server ]


uGet – Download Manager

uGet, the Best Download Manager for Linux. uGet is an Open Source download manager application for GNU/Linux developed with GTK+, which also comes packaged as a portable Windows app. uGet uses very few resources while at the same time packs an unparalleled powerful feature set. These features include a Queue, Pause/Resume, Multi-Connection (with adaptive segment management), Mirrors (multi-source), Multi-Protocol, Advanced Categorization, Clipboard Monitor, Batch Downloads, Individualized Category Default Settings, Speed Limiting, Total Active Downloads Control, and so much more! For the full Features list go to http://ugetdm.com/features – Quick Links – Blog: http://ugetdm.com/blog Support Forum: http://ugetdm.com/forum Tutorials: http://ugetdm.com/tutorials RSS Feeds: http://ugetdm.com/rss Gallery: http://ugetdm.com/gallery Frequently Asked Questions (FAQs): http://ugetdm.com/faqs
[ Download uGet – Download Manager ]

Categories: Open Source

Handling the Banes of Open Source Management

SourceForge.net: Front page news - Fri, 08/12/2016 - 05:54

We cannot solve our problems with the same thinking we used when we created them.

~Albert Einstein

Ah, the creation of an open source project. Where once all you had was an idea and a desire to satisfy a personal need, you now have a fully-formed project that numerous other people are finding useful and effective. You’ve formed a community around it, interacting with people from all over the world who are grateful and willing to offer their time and effort to help build up this project. It’s nothing short of exhilarating.

Then things start to happen that you didn’t necessarily want to happen.  

Competitors, critics and complaining users appear out of nowhere and seem to grow in number. Some even seem to take pleasure in bringing your project down. Forks start to appear, and what’s worse is that some of them claim to be better than the original.

At this point, it seems rational to react strongly; to defend your project with the same vigor and passion you had in creating it. But this is a totally different scenario. While passion makes for great fuel in initial project creation, it’s often not the best basis for logical thinking, which is what is required once you’re in the thick of project management and its accompanying problems.

Finding the Pros in the Cons

As with any major undertaking, an open source project will have its pros and cons. Achieving the project and gaining support for it are often the biggest pros. Acquiring haters and producing diverting forks are in many cases, the biggest cons. But there are positives to these seemingly negative things, and ways you can handle them that can somehow make them beneficial to you.

First, the Haters

Haters and critics are undoubtedly annoying. They know how to get right under your skin especially if you feel strongly about your project, which is usually the case with most developers.

The first thing you have to realize here is that these haters are a good thing. Why? It means that your project is actually meaningful and significant enough to be “hated”. This could also mean that several people actually used the software and that their numbers increased to a point where some of them have now noticed a few downsides to it.

Speaking of downsides, they’re much more visible when critics are around to point them out, aren’t they? So in this sense, critics are actually quite useful. They help you see the faults you need to work on to improve your software and make it serve the needs of the community better.

When criticisms don’t have this redeeming quality however and simply throw shade your way, the best course of action is inaction. Responding to these criticisms will only acknowledge and validate them, so it’s better to take a breath, turn away and move forward.

The Forks

In the open source world, forks are inevitable. Yet when they materialize, some developers can’t help but feel negatively towards them. They can feel threatened, annoyed, betrayed, bitter and even angry.

But as Apache co-founder Brian Behlendorf once said, “the most important requirement [in open source] is the right to fork.” Forking is a natural effect of open sourcing software, and one that is often beneficial. The creation of forks encourages developers to consistently improve their software and remain competitive to the benefit of the entire community. Forks also make software more customized, which is a good thing particularly for software that have a broad scope. Generally, they’re necessary for the healthy balance and continued development of software and the open source environment.

While there’s nothing you can do to stop forks, there are things you can do in response to them. You can focus on differentiation, setting yourself apart from these forks. You can focus on developing your software and making sure it answers the needs of users, so that it will remain the first choice instead of the forks. Whatever you do, don’t try to deliberately and publicly destroy the forks. Why? First, doing so would give you a bad reputation among community members, one that will most certainly drive users and contributors away. Second, you don’t know what changes the future may bring. You may find yourself having to re-merge with a fork, or be totally replaced by them later on.

Conclusion

Managing an open source project is hard work all on its own, and having to handle annoying forks and haters just makes it even harder. But with every challenge is the promise of reward. By handling these things properly and with the right mindset, they can in their own way become beneficial to your project and help you better embrace and nurture the very nature of open source.

Categories: Open Source

PostgreSQL 9.6 Beta 4 Released

PostgreSQL News - Thu, 08/11/2016 - 01:00

The PostgreSQL Global Development Group announces today that the fourth beta release of PostgreSQL 9.6 is available for download. This release contains previews of all of the features which will be available in the final release of version 9.6, including fixes to many of the issues found in the first and second betas. Users are encouraged to continue testing their applications against 9.6 beta 4.

Changes Since Beta 3

9.6 Beta 4 includes the security fixes in the 2016-08-11 Security Update, as well as the general bug fixes offered for stable versions. Additionally, it contains fixes for the following beta issues reported since the last beta:

  • Change minimum max_worker_processes from 1 to 0
  • Make array_to_tsvector() sort and de-duplicate the given strings
  • Fix ts_delete(tsvector, text[]) to cope with duplicate array entries
  • Fix hard to hit race condition in heapam's tuple locking code
  • Prevent "snapshot too old" from trying to return pruned TOAST tuples
  • Make INSERT-from-multiple-VALUES-rows handle targetlist indirection
  • Do not let PostmasterContext survive into background workers
  • Add missing casts in information schema
  • Fix assorted problems in recovery tests
  • Block interrupts during HandleParallelMessages()
  • Remove unused arguments from pg_replication_origin_xact_reset function
  • Correctly handle owned sequences with extensions
  • Many fixes for tsqueue.c
  • Eliminate a few more user-visible "cache lookup failed" errors
  • Teach parser to transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest
  • Allow functions that return sets of tuples to return simple NULLs
  • Repair damage done by citext--1.1--1.2.sql
  • Correctly set up aggregate FILTER expression in partial-aggregation plans

This beta also includes many documentation updates and improvements.

Due to changes in system catalogs, a pg_upgrade or pg_dump and restore will be required for users migrating databases from earlier betas.

Note that some known issues remain unfixed. Before reporting a bug in the beta, please check the Open Items page.

Beta Schedule

This is the fourth beta release of version 9.6. The PostgreSQL Project will release additional betas as required for testing, followed by one or more release candidates, until the final release in late 2016. For further information please see the Beta Testing page.

Links
Categories: Database, Open Source

2016-08-11 Security Update Release

PostgreSQL News - Thu, 08/11/2016 - 01:00

The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 9.5.4, 9.4.9, 9.3.14, 9.2.18 and 9.1.23. This release fixes two security issues. It also patches a number of other bugs reported over the last three months. Users who rely on security isolation between database users should update as soon as possible. Other users should plan to update at the next convenient downtime.

Security Issues

Two security holes have been closed by this release:

  • CVE-2016-5423: certain nested CASE expressions can cause the server to crash.
  • CVE-2016-5424: database and role names with embedded special characters can allow code injection during administrative operations like pg_dumpall.

The fix for the second issue also adds an option, -reuse-previous, to psql's \connect command. pg_dumpall will also refuse to handle database and role names containing line breaks after the update. For more information on these issues and how they affect backwards-compatibility, see the Release Notes.

Bug Fixes and Improvements

This update also fixes a number of bugs reported in the last few months. Some of these issues affect only version 9.5, but many affect all supported versions:

  • Fix misbehaviors of IS NULL/IS NOT NULL with composite values
  • Fix three areas where INSERT ... ON CONFLICT failed to work properly with other SQL features.
  • Make INET and CIDR data types properly reject bad IPv6 values
  • Prevent crash in "point ## lseg" operator for NaN input
  • Avoid possible crash in pg_get_expr()
  • Fix several one-byte buffer over-reads in to_number()
  • Don't needlessly plan query if WITH NO DATA is specified
  • Avoid crash-unsafe state in expensive heap_update() paths
  • Fix hint bit update during WAL replay of row locking operations
  • Avoid unnecessary "could not serialize access" with FOR KEY SHARE
  • Avoid crash in postgres -C when the specified variable is a null string
  • Fix two issues with logical decoding and subtransactions
  • Ensure that backends see up-to-date statistics for shared catalogs
  • Prevent possible failure when vacuuming multixact IDs in an upgraded database
  • When a manual ANALYZE specifies columns, don't reset changes_since_analyze
  • Fix ANALYZE's overestimation of n_distinct for columns with nulls
  • Fix bug in b-tree mark/restore processing
  • Fix building of large (bigger than shared_buffers) hash indexes
  • Prevent infinite loop in GiST index build with NaN values
  • Fix possible crash during a nearest-neighbor indexscan
  • Fix "PANIC: failed to add BRIN tuple" error
  • Prevent possible crash during background worker shutdown
  • Many fixes for issues in parallel pg_dump and pg_restore
  • Make pg_basebackup accept -Z 0 as no compression
  • Make regression tests safe for Danish and Welsh locales

The libpq client library has also been updated to support future two-part PostgreSQL version numbers. This update also contains tzdata release 2016f, with updates for Kemerovo, Novosibirsk, Azerbaijan, Belarus, and Morocco.

EOL Warning for Version 9.1

PostgreSQL version 9.1 will be End-of-Life in September 2016. The project expects to only release one more update for that version. We urge users to start planning an upgrade to a later version of PostgreSQL as soon as possible. See our Versioning Policy for more information.

Updating

All PostgreSQL update releases are cumulative. As with other minor releases, users are not required to dump and reload their database or use pg_upgrade in order to apply this update release; you may simply shut down PostgreSQL and update its binaries. Users who have skipped one or more update releases may need to run additional, post-update steps; please see the release notes for earlier versions for details.

Links: Download Release Notes Security Page Versioning Policy

Categories: Database, Open Source

Eclipse IoT Day @ ThingMonk | Program Live

Eclipse News - Wed, 08/10/2016 - 15:16
Plan to attend the Eclipse IoT Day at ThingMonk in London on Sept. 12!
Categories: Open Source

ECE 2016 Program Schedule Announced

Eclipse News - Mon, 08/08/2016 - 21:00
The ECE program schedule is now available on the website. Register early for best pick of the tutorials & lowest price.
Categories: Open Source

Eclipse Summit India 2016

Eclipse News - Mon, 08/08/2016 - 20:00
India's premier conference nurturing the Eclipse Ecosystem is just around the corner!
Categories: Open Source

Which languages convey the most information in the least space? Introducing the Unimorph dataset.

Google Open Source Blog - Mon, 08/08/2016 - 18:00
Several years ago a science journalist asked me which languages could pack the most information into a 140-character Tweet. Because Twitter defines a character roughly as a single Unicode code point, this turns out to be an easy question to answer. Chinese almost certainly rates as the most “compact” language from that point of view because a single Chinese character represents a whole morpheme (in linguist terminology, a minimal unit of meaning) whereas an English letter only represents a part of a morpheme. The Chinese equivalent of I don’t eat meat, which in English takes 16 characters including spaces is 我不吃肉, which takes just four.

But this question relates to a broader question that as a linguist I have often been asked: which languages are the most “efficient” at conveying information? Or, which languages can convey the same information in the smallest amount of space? Untethered by the idiosyncrasies of Twitter, this question becomes quite difficult to answer. What do you mean by “space”? Number of characters? Number of bytes? Number of syllables? Each of these has its own problems. And perhaps more crucially, what do you mean by “information”? The Shannon notion of information does not straightforwardly apply here.

A group of us at Google set out to answer this question, or at least to provide the form that an answer would have to take. We had the resources and experience needed to annotate data in multiple languages, and we were able to divert some of those resources to this task. The results were published in a paper presented at the 2014 International Conference on Language Resources and Evaluation in Reykjavík, Iceland.

We are now releasing the data on GitHub. The data consist of 85 sentences typical of the kinds of sentences generated by Google Now, translated into eight typologically diverse languages: English, French, Italian, German, Russian, Arabic, Korean, Chinese, which include some highly inflected and uninflected languages, and various types of morphology including inflectional and agglutinative. The data were annotated by one to three annotators depending on the language, with morphological information, counts of the marked features and other information. The main data file is in HTML, color coded by language, which makes it easy to browse but also easy to extract into other formats.

Since the basic information conveyed by each sentence can be assumed to be the same across languages, the main focus of the research was on the additional information that each language marks, and cannot avoid marking. For example, the English sentence:

Use my location for the search results and other services.
has the French translation:

Utilisez ma position pour les résultats de recherche et d'autres services.
The verb ending -ez, in boldface above marks “addressee respect”, a bit of information that is missing from the English original.  One could have used a different ending on the French verb, but then that would not avoid this bit of information—it would be choosing to mark lack of respect, or familiarity with the addressee.

In the paper we tried various ways of measuring the differing information content of the languages relative to various definitions of “space”. Considering all the factors together, we concluded that the languages that conveyed the most information in a given amount of space were highly inflected languages like Russian, with uninflected languages like Chinese actually being the “least efficient” at conveying information.

We don’t expect this to be the final answer, which is why we are releasing the data as open source in the hopes that others will find it useful and maybe can even extend it to more sentences or a wider variety of languages. Ultimately though, any answer to the question of which languages convey the most information in the smallest amount of space must seriously address what is meant by “information”, and must pay heed to the famous maxim by the Russian linguist Roman Jakobson (1959) that “languages differ essentially in what they must convey and not in what they may convey.”

By Richard Sproat, Research Scientist
Categories: Open Source

Projects of the Week, August 8, 2016

SourceForge.net: Front page news - Mon, 08/08/2016 - 05:12

Here are the featured projects for the week, which appear on the front page of SourceForge.net:

Simplicity Linux

Simplicity Linux uses Puppy Linux and derivatives as a base, uses the XFCE window manager, and comes in 3 editions: Netbook, Desktop and Media. Netbook features cloud based software, Desktop features locally based software and Media edition is designed to allow people who want a lounge PC to access their media with ease.
[ Download Simplicity Linux ]


Bodhi Linux

Bodhi is a minimalistic, enlightened, Linux desktop.
[ Download Bodhi Linux ]


Warzone 2100

You command the forces of “The Project” in a battle to rebuild the world after mankind has almost been destroyed by nuclear missiles. The game offers a full campaign with optional (but strongly recommended!), videos, battle against four factions, multi-player and single-player skirmish modes, and an extensive tech tree and a full unit designer. Multi-player is also cross-platform, battle your friends with any OS, Windows, Linux or Mac, it all works seamlessly! We also offer 100% portable Windows builds, take the game and install it anywhere! Our source repo is now at https://github.com/Warzone2100/warzone2100 If you are using linux, and want a .deb, then please get the latest version available from http://www.playdeb.net/app/Warzone2100 (They are not affliated with us, but they do have the latest builds!) Warzone 2100 works on both 32 & 64 bit Windows Vista or higher, 32 or 64 bit Linux, 32 or 64 bit Macs.
[ Download Warzone 2100 ]


FCEUX

An open source NES Emulator for Windows and Unix that features solid emulation accuracy and state of the art tools for power users.
[ Download FCEUX ]


Battle for Wesnoth

The Battle for Wesnoth is a Free, turn-based tactical strategy game with a high fantasy theme, featuring both single-player, and online/hotseat multiplayer combat. Fight a desperate battle to reclaim the throne of Wesnoth, or take hand in any number of other adventures.
[ Download Battle for Wesnoth ]


Money Manager Ex

Money Manager Ex (mmex) is an easy to use, money management application. It is a personal finance manager. It can be used to track your net worth, income vs expenses etc. It runs on Windows, Linux and Mac OSX.
[ Download Money Manager Ex ]


DxWnd

Windows hooker – intercepts system calls to make fullscreen programs running in a window, to support a better compatibility, to enhance video modes and to stretch timing. It is typically very useful to run old windows games.
[ Download DxWnd ]


Pandora FMS: Flexible Monitoring System

Pandora FMS is an enterprise-ready monitoring solution that provides unparalleled flexibility for IT to address both immediate and unforeseen operational issues, including infrastructure and IT processes. It uniquely enables business and IT to adapt to changing needs through a flexible and rapid approach to IT and business deployment. Pandora FMS consolidates all the needs of modern monitoring (ITOM, APM, BAM) and provides status and performance metrics from different operating systems, virtual infrastructure (VMware, Hyper-V, XEN), Docker containers, applications, storage and hardware devices such as firewalls, proxies, databases, web servers or routers. It’s highly scalable (up to 2000 nodes with one single server), 100% web and with multi-tenant capabilities. It has a very flexible ACL system and several different graphical reports and user-defined control screens.
[ Download Pandora FMS: Flexible Monitoring System ]


Rescatux

Rescatux is a GNU/Linux repair cd (and eventually also Windows) but it is not like other rescue disks. Rescatux comes with Rescapp. Rescapp is a nice wizard that will guide you through your rescue and repair tasks. When the wizard is not able to solve your problem you can also enjoy of Rescatux unique support features: * Chat: Open the chat for asking help directly in Rescatux channel. * Share log: After running an option you can share its log (the action registry that it has done) so that in the chat they can help you better. Or better, even, you can help debug and fix Rescatux bugs on the fly. * Share log on forum: Prepares a forum post alike text so that you can just copy and paste it in your favourite forum. Logs are nicely inserted into it with [CODE] symbols. * Boot Info Script: Run Boot Info Script option to share your computer configuration (specially boot one).
[ Download Rescatux ]

Categories: Open Source

Programming Basics: The Function Signature

DevX: Open Source Articles - Fri, 08/05/2016 - 19:21
See how paying attention to your function signature, utilizing language features where possible and using immutable data structures and pure functions can get you pretty far.
Categories: Open Source

Making Rubyists more comfortable on Google Cloud Platform

Google Open Source Blog - Fri, 08/05/2016 - 18:00
One of the many open source efforts at Google is the Google Cloud Platform (GCP) native libraries for our most popular languages. One of these libraries is the gcloud-ruby project on GitHub which is released as the gcloud gem on rubygems.org. There are several gems for accessing Google Cloud Platform resources from Ruby but this gem is different. It is hand coded by Rubyists for Rubyists and that has some distinct advantages.

Many of us have had experience working with libraries that are clearly ported from another language. I usually talk about them as Ruby with a Java accent or Python with a Perl accent. Generally they work just fine but you can run into some low level friction — sometimes things just don’t feel right. Native gems written by members of the community solve this problem. In the case of gcloud-ruby there are some really concrete examples.

First, gcloud-ruby uses syntax that is similar to other popular Ruby libraries. For example, the syntax for specifying a table schema in BigQuery (Google Cloud Platform's very large scale data warehouse) looks like this:

table = dataset.create_table "baby_names" do |schema|
schema.string "name"
schema.string "sex"
schema.integer "number"
end

Creating the same table in popular Ruby on Rails looks like this:

create_table "baby_names" do |schema|
schema.string "name"
schema.string "sex"
schema.integer "number
end

The two are nearly identical. That makes getting up to speed on BigQuery easier and quicker than it would be if the Ruby library didn't use patterns that are already known to the majority of Rubyists. 
Another way the gcloud-ruby library meets the community where it is at is by embracing the community's fondness for doing things several different ways. In Ruby there are often several correct ways to do a given task.
The gcloud-ruby library is no exception. There are a few different ways to authenticate and create the objects you use to interact with the API. Ruby also has many common methods that have aliases. In the standard library Enumerable#map and Enumerable#collect actually run the same code path for example. In gcloud-ruby the vision API uses aliases. Google Cloud Vision provides a single endpoint: annotate. gcloud-ruby has an annotate method but also aliases this method as mark and detect if those make more sense to you (detect is the method that makes the most sense to my brain so that's the one I use). By providing a couple of different aliases it can mean the first thing you try is more likely to work. This speeds up development time and makes learning the library easier. 
The last way the gcloud-ruby gem makes Rubyists feel at home is by having comprehensive tests, a common value and popular discussion topic for the Ruby community. gcloud-ruby uses minitest-spec for testing, a popular choice that most Rubyists can easily read. When I was learning the storage API I looked at the tests for storage to learn how to use the library. There is outstanding documentation as well for those who prefer learning that way but I'm so used to looking at tests that I really appreciated that gcloud-ruby has well written and easily accessible tests.
Above are three examples of how hand-coded libraries from within the community can improve the user experience when learning to use tools. Of course, doing all the development on GitHub in the open also helps. Users can easily see what bugs people have run into and what features are next up in the production queue. And if a user has a feature request (like the previously mentioned Cloud Vision support) they can create a GitHub issue.
If you’re a Rubyist, give gcloud-ruby a shot and let us know what you think!
By Aja Hammerly, Developer Advocate
Categories: Open Source

How to Overcome the Biggest Hurdle of Any Open Source Project

SourceForge.net: Front page news - Fri, 08/05/2016 - 05:45

It isn’t the coding; it’s not even the starting that’s the hardest to do when developing open source software.

It’s making people care.

Making people care enough that they use the software. Making people care enough that they contribute to the project. Making people care enough that they voluntarily form a community around the project.

Making people care is (or should be) the key that starts the whole open source project engine, and keeps it going.

Why Should People Care?

Perhaps you feel that your software will be awesome, and it doesn’t matter that people won’t care initially. But for any open source project to succeed, a large part of its inception should focus on who will actually use the software. Of course the software matters as well; but as we all know, in community-based open source it ceases to matter when you’re the only one who thinks it matters.

So why should people care about your project? This is a question you need to be able to answer right from the start if you ever hope to achieve growth and longevity for your project.

How to Make People Care

Of course you can’t force people to care. However, you can persuade them. There are several ways you can do this:

  • Select technology with broad usage. Creating niche projects is fine, as long as you’re sure that you’ve got a good base of interested users and contributors there. If you’re unsure however, it’s best to choose a project with a number of different applications, or technology that most people use every day like operating systems, databases, etc. These are more likely to generate outsider interest and contributions.
  • Zone in on a real need. Let your software meet a real need in the market and meet it exceptionally. This will guarantee that people take notice of it. Meeting a need could be a matter of timing, or it could be uncovered with research. A need that is uncovered through diligent research is more likely to have long term applications, but will take some time and effort on your part.
  • Clearly specify the value of the project, and the value that people can add to and get from the project. When people understand the value of a project , the value they can get from a project and how they can be valuable to a project, getting them to care and contribute becomes easy. So make sure you do your part. Clearly specify on descriptions the many applications of your software and how it can benefit users and contributors. Make sure you use jargon-free, easy-to-understand language.
  • Develop a culture and architecture of inclusion. Sometimes people just want to feel welcome in order to start caring and contributing. Some people just need encouragement. Make sure you give it. Invite participation constantly, and make it easy for people to participate by writing good documentation and creating modular code that’s easier for contributors to work on.

Once you get people to care about your project, don’t leave them behind. Keep nurturing your community by being supportive of their efforts and sympathetic to their needs. Caring can be contagious, so when you show you care the rest of the community can follow suit.

Categories: Open Source

August 2016, “Staff Pick” Project of the Month – LibreCAD

SourceForge.net: Front page news - Wed, 08/03/2016 - 05:20

For our August “Staff Pick” Project of the Month, we selected LibreCAD, a free Open Source CAD application for Windows, Apple, and Linux. Ries van Twisk and Ravas Mi, two of the developers behind the project shared some of their thoughts about the project’s history, purpose, and direction.

SourceForge (SF): What made you start this project?
Ries van Twisk (RVT): I was working on a CNC machine and my wife asked me why it took so long to let the machine do its job. I told her that I needed to make a drawing, then open another application to generate g-code. So I decided to add a module to QCad so I can do it in one go. Working a bit on this project I came to the conclusion I needed to re-write the code for QT4 because Qt3 was old, which QCad was based on. I showed this to a few people on the LinuxCNC IRC channel and they urged me to put the codebase online. One thing led to another and we gave this project a name, ‘LibreCAD’. I never was able to finish the g-code creation module, but instead we now have a continuation of open source CAD that is available for everybody.

SF: Has the original vision been achieved?
RVT: The very original vision (see previous answer) no but after we decided to create the LibreCAD fork, I do believe we achieved our goal.

SF: Who can benefit the most from your project?
RVT: Anybody that needs to make simple CAD drawings.

SF: What’s the best way to get the most out of using LibreCAD?
RVT: Read our wiki and start using the manual, and of course have a project in mind you want to work on.

SF: What has your project team done to help build and nurture your community?
RVT: Most important for us is to be friendly, help each other out, fix bugs and continue improving LibreCAD.
Our community is active on IRC as well as our forum. We are not huge compared to other projects, but we can sustain ourselves.

SF: Have you all found that more frequent releases helps build up your community of users?
RVT: We are too small for frequent releases. This is due to our team size and we are all doing this in our free time, most of us have a partner and/or children and they come first. We try to release as often as possible but within reason.

SF: What was the first big thing that happened for your project?
RVT: We don’t have ‘the big thing’ but perhaps if I have to think of something for me personally is that LibreCAD is a working project with a solid past and future.

Ravas Mi: (For me it was) support for reading DWG files and exporting MakerCAM SVG files; these are not part of the free version of QCAD, which LibreCAD forked from.

Those came first but I think some of my contributions are worth mentioning.

The new custom toolbar and custom menu system greatly improves user efficiency. With custom menus (think right-click) you don’t need to move out of the drawing area to switch tools or snap modes. Command users might even find themselves more efficient.

Users can now select their own Qt style sheets, which allows for dramatically changing the program’s appearance.

The release which included those also included many other new features and important bug fixes.
(https://github.com/LibreCAD/LibreCAD/releases/tag/2.1.0)

SF: What’s been your mantra throughout the development process?
RVT: Keep doing what you do and make LibreCAD better with each release. It doesn’t have to go fast, it doesn’t have to go frequently as long as it happens.

SF: How has SourceForge and its tools helped your project reach success?
RVT: SF has always been a stable and working platform for us where our user base could download the binaries.

SF: What is the next big thing for LibreCAD?
RVT: We are working on a new version of LibreCAD, currently we call it LibreCAD 3 for lack of better naming. This year we are making big steps with the current two GSoC students. We are far from having a product, but we keep improving and working on it and one day this will be our new LibreCAD.

SF: How long do you think that will take?
RVT: A few years more. Sometimes the project stands still due to lack of time, but we try during GSoC to get 1 or two developers working on the project and I am working on it in my free time because I like c++.

SF: Do you have the resources you need to make that happen?
RVT: At this moment we lack resources, but we will manage, obviously we can use more help.

SF: If you had to do it over again, what would you do differently for LibreCAD?
RVT: I would have more actively asked for help from developers.

SF: Is there anything else we should know?
RVT: We need a few (2 at most) developers that have good knowledge of math and c++ and perhaps a bit of lua.

[ Download LibreCAD ]

Categories: Open Source

SourceForge Improvements: It’s easier than ever to start a project

SourceForge.net: Front page news - Tue, 08/02/2016 - 05:32

Over the past few weeks, we’ve rolled out a series of improvements to make it easier to start a project on SourceForge. We started by adding a “Create” button on the header of every page, so you always can find it.

On the project registration form we now give you faster name suggestions and show more available tools & features. SourceForge projects have a lot of tools available, and now we show them all – including Web Hosting and Mailing Lists. Bonus: if you’re not logged in when you get to the registration form, we show a nice login overlay so you can still see what the form is like while you log in.

Screenshot of project registration form

As soon as you’ve created your project, the new welcome tour guides you through some of the key parts of your project. For example, you’ll see how to customize the tools you want to use on your project, categorize and describe your project, and more.

Screenshot of project welcome tour

We also send you a nice project welcome email, so you’ve got a reference in case you forget where your project is. And even better – when you’re on SourceForge, your account menu lists your projects, so you’ve got easy access to all of your projects.

Have a wonderful time making open source!

Categories: Open Source