## Probabilistic Programming with monad‑bayes (3)

In this blog post series, we're going to lead you through Bayesian modeling in Haskell with the monad-bayes library. In the third part of the series, we setup a simple Bayesian neural network.

Blog: data-science (14 posts)

26 February 2020

In this blog post series, we're going to lead you through Bayesian modeling in Haskell with the monad-bayes library. In the third part of the series, we setup a simple Bayesian neural network.

9 January 2020

In this second post of Tweag's four-part series, we discuss Gibbs sampling, an important MCMC-related algorithm which can be advantageous when sampling from multivariate distributions. Two different examples and, again, an interactive Python notebook illustrate use cases and the issue of heavily correlated samples.

8 November 2019

Here's Part 2 in Tweag's Series about Bayesian modeling in Haskell with the monad-bayes library.

30 October 2019

We're happy to announce the first release of Porcupine, an open source framework to express portable and customizable data pipelines.

25 October 2019

In this first post of Tweag's four-part series on Markov chain Monte Carlo sampling algorithms, you will learn about why and when to use them and the theoretical underpinnings of this powerful class of sampling methods. We discuss the famous Metropolis-Hastings algorithm and give an intuition on the choice of its free parameters. Interactive Python notebooks invite you to play around with MCMC yourself and thus deepen your understanding of the Metropolis-Hastings algorithm.

20 September 2019

In this blog post series, we're going to lead you through Bayesian modeling in Haskell with the monad-bayes library. In the first part of the series, we introduce two fundamental concepts of `monad-bayes`: `sampling` and `scoring`.

1 August 2019

We visualize large collections of Haskell and Python source codes as 2D maps using methods from Natural Language Processing (NLP) and dimensionality reduction and find a surprisingly rich structure for both languages. Clustering on the 2D maps allows us to identify common patterns in source code which give rise to these structures. Finally, we discuss this first analysis in the context of advanced machine learning-based tools performing automatic code refactoring and code completion.

17 July 2019

Every day we write repetitive code. A lot of it is boilerplate that you write only to satisfy your compiler/interpreter. But how do languages differ in their boilerplate content? We explore these questions using data sets of Python and Haskell code.

10 April 2019

Inspired by the Event Horizon Telescope images, we develop a quick exploratory study about future possibilities of this technology called the Sneakernet: Could massive data transfer give a new live to the homing pigeon industry? How about using transportation means that are optimized to carry incredible amounts of weight? Or transportation means that are designed to be fast as a bullet?

28 February 2019

Millions of Jupyter notebooks are spread over the internet - machine learning, astrophysics, biology, economy, you name it. What a great age for reproducible science! Or that's what you think until you try to actually run these notebooks. Then you realize that having understandable high-level code alone is not enough to reproduce something on a computer. JupyterWith is a solution to this problem.

6 February 2019

The repositories of distributions such as Debian and Nixpkgs are among the largest collections of open source (and some unfree) software. They are complex systems that connect and organize many interdependent packages. In this blog post I'll try to shed some light on them from the perspective of Nixpkgs, mostly with visualizations of its complete dependency graph.

23 January 2019

Introduction Haskell and data science - on first sight a great match: native function composition, lazy evaluation, fast execution times, and lots of code checks. These sound like ingredients for scalable, production-ready data transformation pipelines. What is missing then? Why…

20 June 2016

Maintaining a compute cluster for batch or stream data processing is hard work. Connecting it up to storage facilities and time sharing resources across multiple demands even more so. Fortunately cloud service providers these days typically upscale their offering to not just…

25 February 2016

Large scale distributed applications are complex: there are effects at scale that matter far more than when your application is basked in the warmth of a single machine. Messages between any two processes may or may not make it to their final destination. If reading from a memory…