Skip to content

Software Development News: .NET, Java, PHP, Ruby, Agile, Databases, SOA, JavaScript, Open Source

Methods & Tools

Subscribe to Methods & Tools
if you are not afraid to read more than one page to be a smarter software developer, software tester or project manager!


Eight scenarios with Apache Spark on Azure that will transform any business

This post was authored by Rimma Nehme, Technical Assistant, Data Group.


Since its birth in 2009, and the time it was open sourced in 2010, Apache Spark has grown to become one of the largest open source communities in big data with over 400 organizations from 100 companies contributing to it. Spark stands out for its ability to process large volumes of data 100x faster, because data is persisted in-memory. Azure cloud makes Apache Spark incredibly easy and cost effective to deploy with no hardware to buy, no software to configure, with a full notebook experience to author compelling narratives, and integration with partner business intelligence tools. In this blog post, I am going to review of some of the truly game-changing usage scenarios with Apache Spark on Azure that companies can employ in their context.

Scenario #1: Streaming data, IoT and real-time analytics

Apache Spark’s key use case is its ability to process streaming data. With so much data being processed on a daily basis, it has become essential for companies to be able to stream and analyze it all in real time. Spark Streaming has the capability to handle this type of workload exceptionally well. As shown in the image below, a user can create an Azure Event Hub (or an Azure IoT Hub) to ingest rapidly arriving data into the cloud; both Event and IoT Hubs can intake millions of events and sensor updates per second that can then be processed in real-time by Spark.

Scenario 1_Spark Streaming

Businesses can use this scenario today for:

  • Streaming ETL: In traditional ETL (extract, transform, load) scenarios, the tools are used for batch processing, and data must be first read in its entirety, converted to a database compatible format, and then written to the target database. With Streaming ETL, data is continually cleaned and aggregated before it is pushed into data stores or for further analysis.
  • Data enrichment: Streaming capability can be used to enrich live data by combining it with static or ‘stationary’ data, thus allowing businesses to conduct more complete real-time data analysis. Online advertisers use data enrichment to combine historical customer data with live customer behavior data and deliver more personalized and targeted ads in real-time and in the context of what customers are doing. Since advertising is so time-sensitive, companies have to move fast if they want to capture mindshare. Spark on Azure is one way to help achieve that.
  • Trigger event detection: Spark Streaming can allow companies to detect and respond quickly to rare or unusual behaviors (“trigger events”) that could indicate a potentially serious problem within the system. For instance, financial institutions can use triggers to detect fraudulent transactions and stop fraud in its tracks. Hospitals can also use triggers to detect potentially dangerous health changes while monitoring patient vital signs and sending automatic alerts to the right caregivers who can then take immediate and appropriate action.
  • Complex session analysis: Using Spark Streaming, businesses can use events relating to live sessions, such as user activity after logging into a website or application, can be grouped together and quickly analyzed. Session information can also be used to continuously update machine learning models. Companies can then use this functionality to gain immediate insights as to how users are engaging on their site and provide more real-time personalized experiences.
Scenario #2: Visual data exploration and interactive analysis

Using Spark SQL running against data stored in Azure, companies can use BI tools such as Power BI, PowerApps, Flow, SAP Lumira, QlikView and Tableau to analyze and visualize their big data. Spark’s interactive analytics capability is fast enough to perform exploratory queries without sampling. By combining Spark with visualization tools, complex data sets can be processed and visualized interactively. These easy-to-use interfaces then allow even non-technical users to visually explore data, create models and share results. Because wider audience can analyze big data without preconceived notions, companies can test new ideas and visualize important findings in their data earlier than ever before. Companies can identify new trends and new relationships that were not apparent before and quickly drill down into them, ask new questions and find ways to innovate in new and smarter ways.

Scenario 2_Spark visual data exploration and interactive analysis

This scenario is even more powerful when interactive data discovery is combined with predictive analytics (more on this later in this blog). Based on relationships and trends identified during discovery, companies can use logistic regression or decision tree techniques to predict the probability of certain events in the future (e.g., customer churn probability). Companies can then take specific, targeted actions to control or avert certain events.

Scenario #3: Spark with NoSQL (HBase and Azure DocumentDB)

This scenario provides scalable and reliable Spark access to NoSQL data stored either in HBase or our blazing fast, planet-scale Azure DocumentDB, through “native” data access APIs. Apache HBase is an open-source NoSQL database that is built on Hadoop and modeled after Google BigTable. DocumentDB is a true schema-free managed NoSQL database service running in Azure designed for modern mobile, web, gaming, and IoT scenarios. DocumentDB ensures 99% of your reads are served under 10 milliseconds and 99% of your writes are served under 15 milliseconds. It also provides schema flexibility, and the ability to easily scale a database up and down on demand.

The Spark with NoSQL scenario enables ad-hoc, interactive queries on big data. NoSQL can be used for capturing data that is collected incrementally from various sources across the globe. This includes social analytics, time series, game or application telemetry, retail catalogs, up-to-date trends and counters, and audit log systems. Spark can then be used for running advanced analytics algorithms at scale on top of the data coming from NoSQL.

Scenario 3_Spark NoSQL

Companies can employ this scenario in online shopping recommendations, spam classifiers for real time communication applications, predictive analytics for personalization, and fraud detection models for mobile applications that need to make instant decisions to accept or reject a payment. I would also include in this category a broad group of applications that are really “next-gen” data warehousing, where large amounts of data needs to be processed inexpensively and then served in an interactive form to many users globally. Finally, internet of things scenarios fit in here as well, with the obvious difference that the data represents the actions of machines instead of people.

Scenario #4: Spark with Data Lake

Spark on Azure can be configured to use Azure Data Lake Store (ADLS) as an additional storage. ADLS is an enterprise-class, hyper-scale repository for big data analytic workloads. Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts in an enterprise environment to store data of any size, shape and speed, and do all types of processing and analytics across platforms and languages. Because ADLS is a file system compatible with Hadoop Distributed File System (HDFS), it makes it very easy to combine it with Spark for running computations at scale using pre-existing Spark queries.

Scenario 4_Spark with Data Lake

The data lake scenario arose because new types of data needed to be captured and exploited by companies, while still preserving all of the enterprise-level requirements like security, availability, compliance, failover, etc. Spark with data lake scenario enables a truly scalable advanced analytics on healthcare data, financial data, business-sensitive data, geo-location coordinates, clickstream data, server log, social media, machine and sensor data. If companies want an easy way of building data pipelines, have unparalleled performance, insure their data quality, manage access control, perform change data capture (CDC) processing, get enterprise-level security seamlessly and have world-class management and debugging tools, this is the scenario they need to implement.

Scenario #5: Spark with SQL Data Warehouse

While there is still a lot of confusion, Spark and big data analytics is not a replacement for traditional data warehousing. Instead, Spark on Azure can complement and enhance a company’s data warehousing efforts by modernizing the company’s approaches to analytics. A data warehouse can be viewed as an ‘information archive’ that supports business intelligence (BI) users and reporting tools for mission-critical functions of company. My definition of mission-critical is any system that supports revenue generation or cost control. If such a system fails, companies would have to manually perform these tasks to prevent loss of revenue or increased cost. Big data analytics systems like Spark help augment such systems by running more sophisticated computations, smarter analytics and delivering deeper insights using larger and more diverse datasets.

Azure SQL Data Warehouse (SQLDW) is a cloud-based, scale-out database capable of processing massive volumes of data, both relational and non-relational. Built on our massively parallel processing (MPP) architecture, SQLDW combines the power of the SQL Server relational database with Azure cloud scale-out capabilities. You can increase, decrease, pause, or resume a data warehouse in seconds with SQLDW. Furthermore, you save costs by scaling out CPU when you need it and cutting back usage during non-peak times. SQLDW is the manifestation of elastic future of data warehousing in the cloud.

Scenario 5_Spark with SQLDW

Some of the use cases of Spark with SQLDW scenario may include: using data warehouse to get a better understanding of its customers across product groups, then using Spark for predictive analytics on top of that data. Running advanced analytics using Spark on top of the enterprise data warehouse containing sales, marketing, store management, point of sale, customer loyalty, and supply chain data, then run advanced analytics using Spark to drive more informed business decisions at the corporate, regional, and store levels. Using Spark with the data warehousing data, companies can literally do anything from risk modeling, to parallel processing of large graphs, to advanced analytics, text processing – all on top of their elastic data warehouse.

Scenario #6: Machine Learning using R Server, MLlib

Another and probably one of the most prominent Spark use cases in Azure is machine learning. By storing datasets in-memory during a job, Spark has great performance for iterative queries common in machine learning workloads. Common machine learning tasks that can be run with Spark in Azure include (but are not limited to) classification, regression, clustering, topic modeling, singular value decomposition (SVD) and principal component analysis (PCA) and hypothesis testing and calculating sample statistics.

Typically, if you want to train a statistical model on very large amounts of data, you need three things:

  • Storage platform capable of holding all of the training data
  • Computational platform capable of efficiently performing the heavy-duty mathematical computations required
  • Statistical computing language with algorithms that can take advantage of the storage and computation power

Microsoft R Server, running on HDInsight with Apache Spark provides all three things above. Microsoft R Server runs within HDInsight Hadoop nodes running on Microsoft Azure. Better yet, the big-data-capable algorithms of ScaleR takes advantage of the in-memory architecture of Spark, dramatically reducing the time needed to train models on large data. With multi-threaded math libraries and transparent parallelization in R Server, customers can handle up to 1000x more data and up to 50x faster speeds than open source R. And if your data grows or you just need more power, you can dynamically add nodes to the Spark cluster using the Azure portal. Spark in Azure also includes MLlib for a variety of scalable machine learning algorithms, or you can use your own libraries. Some of the common applications of machine learning scenario with Spark on Azure are listed in a table below.

Vertical Sales and Marketing Finance and Risk Customer and Channel Operations and Workforce Retail Demand forecasting

Loyalty programs

Cross-sell and upsell

Customer acquisition Fraud detection

Pricing strategy Personalization

Lifetime customer value

Product segmentation Store location demographics

Supply chain management

Inventory management Financial Services Customer churn

Loyalty programs

Cross-sell and upsell

Customer acquisition Fraud detection

Risk and compliance

Loan defaults Personalization

Lifetime customer value Call center optimization

Pay for performance Healthcare Marketing mix optimization

Patient acquisition Fraud detection

Bill collection Population health

Patient demographics Operational efficiency

Pay for performance Manufacturing Demand forecasting

Marketing mix optimization Pricing strategy

Perf risk management Supply chain optimization

Personalization Remote monitoring

Predictive maintenance

Asset management


Scenario 6_Spark Machine Learning

Examples with just a few lines of code that you can try out right now:

Scenario #7: Putting it all together in a notebook experience

For data scientists, we provide out-of-the-box integration with Jupyter (iPython), the most popular open source notebook in the world. Unlike other managed Spark offerings that might require you to install your own notebooks, we worked with the Jupyter OSS community to enhance the kernel to allow Spark execution through a REST endpoint.

We co-led “Project Livy” with Cloudera and other organizations to create an open source Apache licensed REST web service that makes Spark a more robust back-end for running interactive notebooks.  As a result, Jupyter notebooks are now accessible within HDInsight out-of-the-box. In this scenario, we can use all of the services in Azure mentioned above with Spark with a full notebook experience to author compelling narratives and create data science collaborative spaces. Jupyter is a multi-lingual REPL on steroids. Jupyter notebook provides a collection of tools for scientific computing using powerful interactive shells that combine code execution with the creation of a live computational document. These notebook files can contain arbitrary text, mathematical formulas, input code, results, graphics, videos and any other kind of media that a modern web browser is capable of displaying. So, whether you’re absolutely new to R or Python or SQL or do some serious parallel/technical computing, the Jupyter Notebook in Azure is a great choice.

Scenario 7_Spark with Notebook

You can also use Zeppelin notebooks on Spark clusters in Azure to run Spark jobs. Zeppelin notebook for HDInsight Spark cluster is an offering just to showcase how to use Zeppelin in an Azure HDInsight Spark environment. If you want to use notebooks to work with HDInsight Spark, I recommend that you use Jupyter notebooks. To make development on Spark easier, we support IntelliJ Spark Tooling which introduces native authoring support for Scala and Java, local testing, remote debugging, and the ability to submit Spark applications to the Azure cloud.

Scenario #8: Using Excel with Spark

As a final example, I wanted to describe the ability to connect Excel to Spark cluster running in Azure using the Microsoft Open Database Connectivity (ODBC) Spark Driver. Download it here.

Scenario 8_Spark with Excel

Excel is one of the most popular clients for data analytics on Microsoft platforms. In Excel, our primary BI tools such as PowerPivot, data-modeling tools, Power View, and other data-visualization tools are built right into the software, no additional downloads required. This enables users of all levels to do self-service BI using the familiar interface of Excel. Through a Spark Add-in for Excel users can easily analyze massive amounts of structured or unstructured data with a very familiar tool.


Above, I’ve described some of the amazing, game-changing scenarios for real-time big data processing with Spark on Azure. Any company across the globe, from a huge enterprise to a small startup can take their business to the next level with these scenarios and solutions. The question is, what are you waiting for?

Categories: Database

This Oracle Bug Could Bite You

Database Journal News - Mon, 08/29/2016 - 08:01

In Oracle creating a table may fail even though space is available.  Read on to see the conditions that can make this happen.

Categories: Database

Find Objects Referenced Inside a Stored Procedure

Database Journal News - Mon, 08/22/2016 - 19:45

Not all tools provide a GUI interface, making it difficult to find Oracle's object referenced inside a PLSQL stored procedure. This script finds those objects.

Categories: Database

Real-time Operational Analytics in SQL Server 2016 - Part 1

Database Journal News - Mon, 08/22/2016 - 08:01

Arshad Ali takes a look at traditional analytics architecture, the challenges it faces, and how the newly introduced Real-time Operational Analytics feature overcomes those challenges.

Categories: Database

DB2 Data Warehouse Capacity Planning

Database Journal News - Mon, 08/15/2016 - 08:01

Early data warehouse implementations began as collections of financial and customer data that accumulated over time. Modern warehouses have evolved into complex and elegant enterprise analytics platforms, hosting a broad collection of multiple data types, queried by advanced business intelligence software. As the warehouse environment becomes more valuable, capacity planning becomes critical. In this article we present several strategies for managing data warehouse capacity planning and performance tuning.

Categories: Database


Oracle Database News - Fri, 08/12/2016 - 12:00
Press Release ORACLE LAUNCHES LARGEST B2B AUDIENCE DATA MARKETPLACE Oracle Data Cloud’s B2B Audience Solution Includes 400+ Million Business Users and 1 Million Addressable US Companies to Support Account-Based Marketing at Scale

Redwood Shores Calif—Aug 12, 2016

Oracle Data Cloud today launched the largest business-to-business (B2B) audience data marketplace to help make programmatic and data-driven B2B marketing easier.   To help B2B marketers improve their targeting throughout the marketing funnel, Oracle Data Cloud’s B2B audience solution provides access to more than 400 million business profiles through thousands of B2B audience segments, thus creating a highly scalable and customizable targeting solution. In addition, more than one million addressable US companies add powerful account-based marketing (ABM) capabilities to a marketer’s targeting toolkit.   Oracle Data Cloud’s B2B audience solution is designed to meet specific B2B marketing needs:  
  • Account-Based Marketing – Reach buyers and decision makers at specific companies to align B2B marketing and sales efforts
  • Company Past Purchases – Build audiences based on companies that have purchased a specific enterprise solution in the past
  • Event-Based Marketing – Digitally target professionals who have attended or are considering attending specific industry events related to a business’ products
  • OnRamp for B2B – Upload and reach their prospect and customer databases through digital marketing campaigns
  “Our B2B audience solution is designed to provide the digital targeting flexibility and scale that B2B marketers need,” said Rob Holland, Group Vice President of the Oracle Data Cloud. "Our account-based marketing backbone recognizes that effective digital B2B marketing should support a company’s sales goals by focusing on the accounts it is trying to reach.”   The B2B solution integrates proprietary insights from Oracle BlueKai, Datalogix, and AddThis. Oracle Data Cloud’s B2B data is further enriched through strategic partnerships with leading B2B data providers like Bombora, Dun & Bradstreet, FullContact, Gravy Analytics, HG Data, Infogroup, PlaceIQ, and TransUnion and predictive analytics from Leadspace. B2B marketers can now take advantage of more than 700 enhanced Oracle B2B audience segments, as well as a robust B2B audience marketplace boasting over 4000 pre-built audiences from partners.    "The challenge for B2B marketers has been connecting the account-specific needs of sales with their broader digital marketing campaigns, so their campaigns reach their targets,” said Sean Beierly, Data Scientist and Marketing Manager at Cisco Systems. “Oracle Data Cloud is helping us reach the right decision makers in the right companies across the many devices they use at scale.”   Oracle Data Cloud’s B2B audience solution allows marketers to align digital spend with both campaign objectives and sales outreach, providing a regular flow of relevant and qualified leads from target accounts. That ability to combine granular B2B targeting segments with an account-based filter makes it easier for B2B brands to take full advantage of the digital channel.   "Effective B2B marketing requires both accuracy and scale, and Oracle Data Cloud's B2B audience solution provides both the reach and the targeting we need for our account-based marketing efforts," said Patrice Lagrange, Senior Director, Digital Demand Nurturing Services, Hewlett Packard Enterprise. “We are pleased to be working with Oracle Data Cloud to support our enterprise sales efforts with robust data-driven marketing campaigns."     Oracle Data Cloud gives marketers the ability to access, blend and activate audiences from Datalogix and BlueKai as well as the industry’s leading B2B data providers in one place.  Marketers can now work with a single partner to build highly customized audiences leveraging a broad spectrum of data sources and deliver them to hundreds of publishers and consumer platforms.   “Through our data cooperative of premium media companies, Bombora’s data helps B2B marketers reach influencers and decision makers at companies that are in-market for their products and services,” said Greg Herbst, VP Programmatic Data, Bombora. “We are delighted to deepen our partnership with Oracle Data Cloud to include our account-based data points, and to help fuel a powerful new industry solution.”   About Oracle Data Cloud: Oracle Data Cloud operates the BlueKai Marketplace, the world’s largest audience data marketplace. Oracle Data Cloud is the leading global Data as a Service (DaaS) solution, offering access to more than $3 trillion in consumer transaction data, two billion global consumer profiles, and 1,500+ data partners. Oracle Data Cloud integrates that data with more than 200 major media companies, including publisher exchanges, ad networks, DSPs, DMPs, and agency trading desks. For more information and free data consultation, contact The Data Hotline at   Contact Info Erik Kingham
1.650.506.8298 About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit


Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Erik Kingham

  • 1.650.506.8298

Follow Oracle Corporate

Categories: Database, Vendor

Index Sanity in Oracle

Database Journal News - Thu, 08/11/2016 - 08:01

Careful testing and planning are crucial when deciding if an index is truly ‘unused’.  Read on to see what could happen if testing isn’t thorough.

Categories: Database

PostgreSQL 9.6 Beta 4 Released

PostgreSQL News - Thu, 08/11/2016 - 01:00

The PostgreSQL Global Development Group announces today that the fourth beta release of PostgreSQL 9.6 is available for download. This release contains previews of all of the features which will be available in the final release of version 9.6, including fixes to many of the issues found in the first and second betas. Users are encouraged to continue testing their applications against 9.6 beta 4.

Changes Since Beta 3

9.6 Beta 4 includes the security fixes in the 2016-08-11 Security Update, as well as the general bug fixes offered for stable versions. Additionally, it contains fixes for the following beta issues reported since the last beta:

  • Change minimum max_worker_processes from 1 to 0
  • Make array_to_tsvector() sort and de-duplicate the given strings
  • Fix ts_delete(tsvector, text[]) to cope with duplicate array entries
  • Fix hard to hit race condition in heapam's tuple locking code
  • Prevent "snapshot too old" from trying to return pruned TOAST tuples
  • Make INSERT-from-multiple-VALUES-rows handle targetlist indirection
  • Do not let PostmasterContext survive into background workers
  • Add missing casts in information schema
  • Fix assorted problems in recovery tests
  • Block interrupts during HandleParallelMessages()
  • Remove unused arguments from pg_replication_origin_xact_reset function
  • Correctly handle owned sequences with extensions
  • Many fixes for tsqueue.c
  • Eliminate a few more user-visible "cache lookup failed" errors
  • Teach parser to transform "x IS [NOT] DISTINCT FROM NULL" to a NullTest
  • Allow functions that return sets of tuples to return simple NULLs
  • Repair damage done by citext--1.1--1.2.sql
  • Correctly set up aggregate FILTER expression in partial-aggregation plans

This beta also includes many documentation updates and improvements.

Due to changes in system catalogs, a pg_upgrade or pg_dump and restore will be required for users migrating databases from earlier betas.

Note that some known issues remain unfixed. Before reporting a bug in the beta, please check the Open Items page.

Beta Schedule

This is the fourth beta release of version 9.6. The PostgreSQL Project will release additional betas as required for testing, followed by one or more release candidates, until the final release in late 2016. For further information please see the Beta Testing page.

Categories: Database, Open Source

2016-08-11 Security Update Release

PostgreSQL News - Thu, 08/11/2016 - 01:00

The PostgreSQL Global Development Group has released an update to all supported versions of our database system, including 9.5.4, 9.4.9, 9.3.14, 9.2.18 and 9.1.23. This release fixes two security issues. It also patches a number of other bugs reported over the last three months. Users who rely on security isolation between database users should update as soon as possible. Other users should plan to update at the next convenient downtime.

Security Issues

Two security holes have been closed by this release:

  • CVE-2016-5423: certain nested CASE expressions can cause the server to crash.
  • CVE-2016-5424: database and role names with embedded special characters can allow code injection during administrative operations like pg_dumpall.

The fix for the second issue also adds an option, -reuse-previous, to psql's \connect command. pg_dumpall will also refuse to handle database and role names containing line breaks after the update. For more information on these issues and how they affect backwards-compatibility, see the Release Notes.

Bug Fixes and Improvements

This update also fixes a number of bugs reported in the last few months. Some of these issues affect only version 9.5, but many affect all supported versions:

  • Fix misbehaviors of IS NULL/IS NOT NULL with composite values
  • Fix three areas where INSERT ... ON CONFLICT failed to work properly with other SQL features.
  • Make INET and CIDR data types properly reject bad IPv6 values
  • Prevent crash in "point ## lseg" operator for NaN input
  • Avoid possible crash in pg_get_expr()
  • Fix several one-byte buffer over-reads in to_number()
  • Don't needlessly plan query if WITH NO DATA is specified
  • Avoid crash-unsafe state in expensive heap_update() paths
  • Fix hint bit update during WAL replay of row locking operations
  • Avoid unnecessary "could not serialize access" with FOR KEY SHARE
  • Avoid crash in postgres -C when the specified variable is a null string
  • Fix two issues with logical decoding and subtransactions
  • Ensure that backends see up-to-date statistics for shared catalogs
  • Prevent possible failure when vacuuming multixact IDs in an upgraded database
  • When a manual ANALYZE specifies columns, don't reset changes_since_analyze
  • Fix ANALYZE's overestimation of n_distinct for columns with nulls
  • Fix bug in b-tree mark/restore processing
  • Fix building of large (bigger than shared_buffers) hash indexes
  • Prevent infinite loop in GiST index build with NaN values
  • Fix possible crash during a nearest-neighbor indexscan
  • Fix "PANIC: failed to add BRIN tuple" error
  • Prevent possible crash during background worker shutdown
  • Many fixes for issues in parallel pg_dump and pg_restore
  • Make pg_basebackup accept -Z 0 as no compression
  • Make regression tests safe for Danish and Welsh locales

The libpq client library has also been updated to support future two-part PostgreSQL version numbers. This update also contains tzdata release 2016f, with updates for Kemerovo, Novosibirsk, Azerbaijan, Belarus, and Morocco.

EOL Warning for Version 9.1

PostgreSQL version 9.1 will be End-of-Life in September 2016. The project expects to only release one more update for that version. We urge users to start planning an upgrade to a later version of PostgreSQL as soon as possible. See our Versioning Policy for more information.


All PostgreSQL update releases are cumulative. As with other minor releases, users are not required to dump and reload their database or use pg_upgrade in order to apply this update release; you may simply shut down PostgreSQL and update its binaries. Users who have skipped one or more update releases may need to run additional, post-update steps; please see the release notes for earlier versions for details.

Links: Download Release Notes Security Page Versioning Policy

Categories: Database, Open Source

Five must-see speakers at the Microsoft Data Science Summit

The Microsoft Data Science Summit is filled with leading thinkers in big data, machine learning, AI, and open-source technologies. Join us, and get their insights and technical expertise as they discuss real-world challenges and innovative solutions emerging across data science. Here’s a sample of some of the speakers you’ll see—and what they’ll be talking about:

Rafal Lukawiecki, data scientist at Project Botticelli

Rafal will discuss the business opportunity of advanced analytics and the new landscape of data. He’ll speak about data science in practice and the cloud-based Cortana Intelligence Suite, especially Azure Machine Learning and the pros and cons of a variety of data storage approaches.

David Smith, R community lead at Microsoft

Whether it’s called data science, machine learning, or predictive analytics, the combination of new data sources and statistical modeling has produced some truly revolutionary applications. Many of these applications incorporate open-source technologies and research from academic institutions.

In his talk, David will share a few of the ways Microsoft is improving the lives of people around the world—and in particular, people with disabilities—by applying statistics, research, and open-source software in applications and devices. He’ll also share how you can develop such applications yourself, using the open-source R language with Microsoft’s advanced analytics products.

Danielle Dean, senior data scientist lead at Microsoft Wee Hyong Tok, senior data science manager at Microsoft

How do businesses and data scientists work together to turn raw data into intelligent action? Why do some companies drown in volumes of data, while others thrive on turning the data into golden strategic advantages?

With Wee Hyong Tok and Danielle Dean, unlock the super powers that data scientists use to turn raw data into big results. This talk will draw on practical experiences from working on various exciting data science projects, such as:

  • Understanding the galaxies by working with citizen astronomers to create labeled datasets, and performing classification of the galaxies
  • Understanding the brain and figuring out how to decode signals from the brain using machine learning
  • Empowering aero engine manufacturers to improve aircraft efficiency, drive up aircraft availability, and reduce engine maintenance cost

The session is targeted at data scientists, developers, and database professionals with a keen interest in evolving existing skillsets and creating new value for their organizations.

Frank Seide, principal researcher at Microsoft Research

This talk will introduce CNTK, Microsoft’s cutting-edge, deep-learning toolkit.

CNTK is used to train and evaluate deep neural networks used in Microsoft products, such as the Cortana speech models. It supports feed-forward, convolutional, and recurrent networks for speech, image, and text workloads.

Frank, a key contributor to the development of CNTK, will walk us through it. He’ll discuss what you can and cannot do with CNTK, what a typical use might look like, how it works, and what algorithms it implements.

Join us. Connect in person—and dive deep.

The Microsoft Data Science Summit includes three in-depth tracks you can choose from to get the expertise you want: Advanced Analytics, Big Data, and Solutions. So if you’re a data scientist, big data engineer, or machine learning practitioner who is looking to expand your knowledge with expert insights, join us in Atlanta, September 26–27. But register soon. The summit only happens once a year, and it’s just around the corner!

> Register for Microsoft Data Science Summit
Categories: Database

Selected common SQL features for developers of portable DB2 applications

IBM - DB2 and Informix Articles - Tue, 08/09/2016 - 05:00
Are you writing SQL applications that need to be portable across platforms? Here's the information you need to make sure your applications are portable. The tables in this article summarize the common SQL application features and make it easy for you to develop applications using SQL that is portable across the DB2 family, including DB2 for z/OS, DB2 for i, and DB2 for Linux, UNIX, and Windows.
Categories: Database

Asahi Refining Selects Oracle Cloud to Improve Financial Visibility and Accelerate Business Growth

Oracle Database News - Mon, 08/08/2016 - 18:13
Press Release Asahi Refining Selects Oracle Cloud to Improve Financial Visibility and Accelerate Business Growth Modern Finance Platform Enables Asahi Refining to Embrace Industry Change

Redwood Shores, Calif.—Aug 8, 2016

Asahi Refining, the world’s leading provider of precious metal assaying, refining, and bullion products, selected Oracle Cloud Applications and Oracle Cloud Platform to streamline its procurement and financial processes to get a more comprehensive and accurate picture of its financials to provide better visibility into the business.  By moving to the cloud, Asahi Refining has been able to shift its full attention to its core business of refining gold and silver and accelerate business growth.

The ongoing digitization of the refining industry means that organizations need an integrated financial platform to leverage data insights that can help evolve their business models and retain their competitive advantage. To address this market shift, Asahi Refining needed to overhaul its legacy enterprise resource planning (ERP) system, which was difficult to maintain, had limited reporting capabilities and contained fragmented data spread across various silos. The company needed a modern, integrated system to gain the insights needed for swift approvals and decision making.

“In order to update our outdated and over-extended IT infrastructure, we needed to move our financials to a centralized and secure environment,” said Kevin Braddy, IT director, Asahi Refining. “The Oracle ERP Cloud gives us real-time visibility into finance operations across the company and helps drive efficiencies across our financial processes. With this accurate financial information easily at hand, we are able to focus on growing our business.” 

Using the Oracle ERP Cloud and Oracle Cloud Platform, Asahi Refining was able to replace its legacy ERP environment with an integrated cloud-based financial system. Within three months, Asahi Refining was able to fully implement the solution and transition to Oracle Self-Service Procurement Cloud, Oracle Financials Cloud, and Oracle Purchasing Cloud.  The company now has a highly accurate, 360-degree view of its financial systems and operations. In addition, Asahi Refining was able to standardize reporting and reduce month-end reporting from a week to just three days, while increasing its efficiency in processing receivable transactions.

“We are happy to be working with Asahi Refining to help them transform their business with the Oracle Cloud,” said Amit Zavery, senior vice president, cloud platform and integration, Oracle.  “Moving from legacy systems to the cloud enabled Asahi Refining to modernize its technology systems, improving visibility into the business and ultimately accelerating growth and increasing efficiency.”

Asahi Refining used the Oracle Java Cloud and Oracle Database Cloud to seamlessly integrate its Oracle ERP Cloud applications with its legacy ERP system and third-party payroll applications, as well as to validate all data coming into the Oracle ERP Cloud from those legacy applications. Additionally, Asahi Refining has been able to lower its total cost-of-ownership by moving to the cloud, which the company can now leverage to realize additional business efficiencies in the future.

The Oracle Cloud runs in 19 data centers around the world and supports 70+ million users and more than 34 billion transactions each day. With the Oracle Cloud, Oracle delivers the industry’s broadest suite of enterprise-grade cloud services, including Software as a Service (SaaS), Platform as a Service (PaaS), Infrastructure as a Service (IaaS), and Data as a Service (DaaS).

  Contact Info Nicole Maloney
+1.650.506.0806 About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit


Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Nicole Maloney

  • +1.650.506.0806

Follow Oracle Corporate

Categories: Database, Vendor

Querying Multiple MySQL Tables

Database Journal News - Mon, 08/08/2016 - 08:01

It’s been said that one of the drawbacks to normalization to the third form (3NF) is more cumbersome data extraction due to the greater number of tables.  These require careful linking via JOIN clauses.  Improper table joining can easily result in erroneous results or even in the dreaded Cartesian Product. In today’s article, we’ll explore how table joins are achieved in MySQL.

Categories: Database

Predictive Cloud Computing for professional golf and tennis, Part 7: Big Data Storage & Analytics—IBM DB2 and Graphite

IBM - DB2 and Informix Articles - Fri, 08/05/2016 - 05:00
Management of large amounts of data is a challenge that provides the opportunity to explore different approaches to managing complex states and analyzing numerous metrics. Our professional golf and tennis tournaments generate terabytes of data from site traffic alone. The Predictive Cloud Computing (PCC) system utilizes IBM DB2 to store aggregated information generated from the source data and Graphite to analyze metrics and profile our codebase. Each of these tools gives the PCC system the ability to store and analyze large amounts of data and provide straightforward retrieval for latent analysis.
Categories: Database

Forbes Media Selects Oracle Marketing Cloud to Increase Advertising Revenue

Oracle Database News - Thu, 08/04/2016 - 13:00
Press Release Forbes Media Selects Oracle Marketing Cloud to Increase Advertising Revenue Global publisher to leverage high-quality data and superior analytics capabilities to help advertisers reach and engage new audiences

Redwood Shores Calif—Aug 4, 2016

Forbes Media has selected the Oracle Marketing Cloud to take advantage of new digital channels to increase advertising revenue. The deep insights delivered by the Oracle Marketing Cloud’s data analytics and management platform will enable Forbes Media to help advertisers increase audience reach and engagement.   Digital technologies have transformed the competitive landscape in the publishing industry. Free, easily accessible content makes it more difficult for publishers to grow paid subscriptions; new digital publishing platforms have increased the number of publishers by eliminating many of the traditional barriers to market entry; and an explosion of digital channels such as search, social and video have created new competition for ad buys. These changes have forced publishers to rethink how they work with advertisers to reach and engage audiences.   "We knew we needed a new platform that could help our advertisers increase audience reach and engagement,” said Mark Howard, chief revenue officer, Forbes Media. “The Oracle Marketing Cloud provides us with the tools we need to expand and customize our advertisers’ experience on our site and to offer marketers deeper insight into the performance of their digital campaigns. The breadth and scale of Oracle’s third-party data and the seamless integration it has with our systems has created a product that will allow us to provide advertisers with a level of analytics that was previously unachievable.”   Forbes will use the Oracle Data Management Platform (DMP), part of Oracle Marketing Cloud, to analyze its core user base and to provide advertisers, including those running a native BrandVoice program, with enriched detail about the audience that their campaign has reached. The Oracle DMP’s advanced analytics will allow Forbes to work with its advertising partners to develop custom, niche segments tailored to the marketers’ ideal audience. Finally, the Oracle ID Graph, a technology that helps marketers connect identities across disparate marketing channels and devices to one customer, will help Forbes target customers and prospects across all channels and devices, ensuring a relevant, personalized customer experience for each individual.   “Savvy publishers like Forbes are becoming more sophisticated in how they tap their first-party data to make its sites’ audiences more valuable to advertisers,” said Andrea Ward, Vice President of Marketing, Oracle Marketing Cloud. “Leveraging the power of the Oracle data management platform to scale their first-party audience data, Forbes will be better able to package audience inventory to align with advertisers’ needs. This will enhance both the advertisers’ and Forbes revenue streams.”   For more information on the Oracle Marketing Cloud’s data management platform, please see the Oracle Marketing Cloud website.   About Forbes Media Forbes Media is a global media, branding and technology company, with a focus on news and information about business, investing, technology, entrepreneurship, leadership and affluent lifestyles. The company publishes Forbes, Forbes Asia, and Forbes Europe magazines as well as The Forbes brand today reaches more than 94 million people worldwide with its business message each month through its magazines and 36 licensed local editions around the globe,, TV, conferences, research, social and mobile platforms. Forbes Media’s brand extensions include conferences, real estate, education, financial services, and technology license agreements.   Contact Info Erik Kingham
1.650.506.8298 Mia Carbonell
Forbes Media
1.212.620.2288 About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit


Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Erik Kingham

  • 1.650.506.8298

Mia Carbonell

  • 1.212.620.2288

Follow Oracle Corporate

Categories: Database, Vendor

Introduction to SQL Server Stretch Database

Database Journal News - Thu, 08/04/2016 - 08:01

In many cases Azure SQL Database offers an economically and functionally viable alternative to SQL Server deployments. However, there are also scenarios where we might discover that rather than serving as a replacement, it provides synergy, working side by side with your on-premises databases. One of technologies that illustrate this paradigm is Stretch Database, introduced in SQL Server 2016. We will describe its basic characteristics and review its implementation steps in this article.

Categories: Database

ODBC Driver 13.1 for SQL Server released

We are pleased to announce the full release of the Microsoft ODBC Driver 13.1 for SQL Server. The updated driver provides robust data access to Microsoft SQL Server and Microsoft Azure SQL Database for C/C++ based applications.

What’s new Always Encrypted

You can now use Always Encrypted with the Microsoft ODBC Driver 13.1 for SQL Server. Always Encrypted is a new SQL Server 2016 and Azure SQL Database security feature that prevents sensitive data from being seen in plaintext in a SQL instance. You can now transparently encrypt the data in the application, so that SQL Server or Azure SQL Database will only handle the encrypted data and not plaintext values. If a SQL instance or host machine is compromised, an attacker can only access ciphertext of your sensitive data. Use the ODBC Driver 13.1 to encrypt plaintext data and store the encrypted data in SQL Server 2016 or Azure SQL Database. Likewise, use the driver to decrypt your encrypted data.

Azure Active Directory (AAD)

AAD authentication is a mechanism of connecting to Azure SQL Database v12 using identities in AAD. Use AAD authentication to centrally manage identities of database users and as an alternative to SQL Server authentication. The ODBC Driver 13.1 allows you to specify your AAD credentials in the ODBC connection string to connect to Azure SQL DB.

Internationalized Domain Names (IDNs)

IDNs allow your web server to use Unicode characters for server name, enabling support for more languages. Using the new Microsoft ODBC Driver 13.1 for SQL Server, you can convert a Unicode serverName to ASCII compatible encoding (Punycode) when required during a connection.

AlwaysOn Availability Groups (AG)

The driver now supports transparent connections to AlwaysOn Availability Groups. The driver quickly discovers the current AlwaysOn topology of your server infrastructure and connects to the current active server transparently.

Note: You can also download ODBC Driver 13 for SQL Server from the download center. ODBC Driver 13 for SQL Server was released with SQL Server 2016 and does not include new features such as Always Encrypted and Azure Active Directory Authentication.

Next steps

Download the ODBC Driver 13.1 for SQL Server.


We are committed to bringing more feature support for connecting to SQL Server, Azure SQL Database and Azure SQL DW. We invite you to explore the latest the Microsoft Data Platform has to offer via a trial of Azure SQL Database or by trying the new SQL Server 2016.

Please stay tuned for upcoming releases that will have additional feature support. This applies to our wide range of client drivers including PHP 7.0, JDBC and ADO.NET which are already available.

Categories: Database

SUPERVALU Uses Oracle Cloud to Provide Expanded Service Offerings to its Customers

Oracle Database News - Wed, 08/03/2016 - 13:00
Press Release SUPERVALU Uses Oracle Cloud to Provide Expanded Service Offerings to its Customers Oracle Cloud to enable SUPERVALU to deliver new capabilities, reduce costs, and provide more efficient service offerings to retailers

Eden Prairie, Minn. and Redwood Shores, Calif.—Aug 3, 2016

SUPERVALU INC. (NYSE: SVU), one of the largest grocery wholesalers and retailers in the United States, today announced it has selected the Oracle Cloud as its new technology platform that will enable the company to deliver more robust business management and data analytics capabilities to its customers. With Oracle Cloud, SUPERVALU will be able to provide an integrated portfolio of enterprise-grade cloud services to enhance the performance of its Human Resources and Finance functions.

“We have a terrific opportunity to deliver more professional services and back-end support to our existing wholesale customers, as well as leverage our scale and expertise to reach new customers,” said Randy Burdick, SUPERVALU’s Executive Vice President, Chief Information Officer. “The Oracle Cloud provides us with a more robust infrastructure and a comprehensive solution that we believe will help us drive increased efficiencies, speed decision-making, and enhance the overall customer experience. We’re excited to offer this solution to our customers as another example of how we’re using technology to further enhance their businesses for the future.”

Initially, SUPERVALU will implement Oracle ERP Cloud and Oracle HCM Cloud, part of a broad suite of modern Oracle Cloud applications that are integrated with social, mobile and analytic capabilities, resulting in improved process integration and more complete and impactful reporting. The Oracle Cloud provides an enhanced user experience via a simple, scalable and intuitive design that can be delivered to users on desktop, tablet and mobile applications.  SUPERVALU will use Oracle HCM Cloud’s suite of talent and workforce management offerings designed to provide an engaging and collaborative HR experience to find and retain quality employees.

“Oracle Cloud provides SUPERVALU with a complete set of enterprise-grade cloud applications, offering both breadth and depth of functionality,” said Rondy Ng, Senior Vice President, Applications Development, Oracle. “Our integrated ERP and HCM cloud solutions are uniquely placed to cater to today’s changing business environment and enable our customers to modernize back-office operations and empower their people.”

SUPERVALU will begin a phase-in of these new Oracle Cloud offerings across three key parts of its business: internal administration of its wholesale distribution and retail grocery businesses, and in services the company provides externally to its customers.

Contact Info Nicole Maloney
+1.650.506.0806 Jeff Swanson
1.952.930.1645 About SUPERVALU

SUPERVALU INC. is one of the largest grocery wholesalers and retailers in the U.S. with annual sales of approximately $18 billion. SUPERVALU serves customers across the United States through a network of 3,588 stores composed of 1,796 independent stores serviced primarily by the Company’s food distribution business; 1,360 Save-A-Lot stores, of which 897 are operated by licensee owners; and 200 traditional retail grocery stores (store counts as of February 27, 2016). Headquartered in Minnesota, SUPERVALU has approximately 40,000 employees. For more information about SUPERVALU visit

About Oracle

Oracle offers a comprehensive and fully integrated stack of cloud applications and platform services. For more information about Oracle (NYSE:ORCL), visit


Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.

Safe Harbor

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle's products remains at the sole discretion of Oracle Corporation. 

Talk to a Press Contact

Nicole Maloney

  • +1.650.506.0806

Jeff Swanson

  • 1.952.930.1645

Follow Oracle Corporate

Categories: Database, Vendor

Temporal Data Part 3 – Reporting Out Current and Historical Information

Database Journal News - Mon, 08/01/2016 - 08:01

With the introduction of temporal table support in SQL Server 2016 Microsoft also added some additional functionality that makes it easy for you to join the current and history records of a system-versioned table. Greg Larsen shows you some of the different ways to do analysis of your system-versioned records over time.

Categories: Database

Don’t miss SQL Server Geeks Annual Summit 2016!

This post was authored by Rimma Nehme, Technical Assistant, Data Group.

I am really excited to both attend and speak at the SQL Server Geeks Annual Summit (#SSGAS2016), Asia’s Premier Data & Analytics Conference taking place on 11-13 August in Bangalore, India. SQLServerGeeks Annual Summit 2016 is a full 3-day conference with more than 100 breakout sessions and deep dive pre-con sessions on SQL Server, BI & Analytics, Cloud, Big Data, and related technologies.

This is a truly unique conference (see this video), comprised with multiple tracks on Database Management, Database Development, Business Intelligence, Advanced Analytics, Cloud, and Big Data. The summit attracts SQL experts from around the globe. SSGAS 2016 is the only Data/Analytics event in Asia where product teams from Microsoft’s Data Group fly down from Redmond to deliver advanced sessions on the latest technologies. Apart from engineering, the conference gets full participation from the SQL CAT & TIGER teams of Microsoft.

Last year’s summit was a great success, and you can see some of the feedback below.

Why should you attend?

•    To get real-world training from industry experts.
•    To hear a very thought-provoking keynote by Joseph Sirosh, CVP of Data Group at Microsoft.
•    To learn directly from our engineering and customer experts on how you can build data-driven intelligent solutions, on-premises and in the cloud.
•    To learn how SQL Server 2016 with R, Hadoop, and other advanced technologies can drive new and exciting services for your customers.
•    To network and connect with the MVPs and MCMs.
•    To talk directly to our product team members.
•    To ask questions during Open-Talks and Chalk-Talks.
•    To learn about Advanced Analytics, Cloud, and Big Data.
•    To see expert-level demo-oriented sessions.
•    To learn about the latest trends in the Data & Analytics world.
•    To hear me talk about our Planet-Scale NoSQL DocumentDB.
•    Do you really need to hear more reasons to attend? Smile

I invite you to watch the #SSGAS2016 hashtag on Twitter for new and exciting updates. Hopefully I’ll see some of you there!

Categories: Database