<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.9.3">Jekyll</generator><link href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" /><updated>2023-02-22T15:57:40+00:00</updated><id>/feed.xml</id><title type="html">Gijs Koot</title><subtitle>Homepage and occasional blog. I am a Data Scientist at SpotR.ai in The Hague
</subtitle><entry><title type="html">Partial batch failure with SQS driven Lambda functions</title><link href="/aws/sqs/2022/05/09/partial-sqs-batch-failure.html" rel="alternate" type="text/html" title="Partial batch failure with SQS driven Lambda functions" /><published>2022-05-09T00:00:00+00:00</published><updated>2022-05-09T00:00:00+00:00</updated><id>/aws/sqs/2022/05/09/partial-sqs-batch-failure</id><content type="html" xml:base="/aws/sqs/2022/05/09/partial-sqs-batch-failure.html">&lt;p&gt;
When you have a lambda that is driven by a SQS queue, like &lt;a href=&quot;https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html&quot;&gt;this&lt;/a&gt;, your lambda can
receive up to ten messages per batch. Your handler can look like this, handling
all of the messages in a single event in a for loop.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #bc6ec5; font-weight: bold;&quot;&gt;lambda_handler&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;event: &lt;span style=&quot;color: #4f97d7;&quot;&gt;dict&lt;/span&gt;, context: &lt;span style=&quot;color: #4f97d7;&quot;&gt;dict&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt; -&amp;gt; &lt;span style=&quot;color: #a45bad;&quot;&gt;None&lt;/span&gt;:
    &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;for&lt;/span&gt; msg &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;in&lt;/span&gt; event&lt;span style=&quot;color: #4f97d7;&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;Records&quot;&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;]&lt;/span&gt;:
        handle_msg&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;msg&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
If you do nothing and all messages are handled without exceptions, the messages
will be deleted from the SQS queue automatically for you. Logically, if there is
an error, the messages will not be deleted. They will be put back to the queue,
or, depending on the arrangement, they will be sent to the &lt;a href=&quot;https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html&quot;&gt;dead letter queue&lt;/a&gt;.
But if your handler function raises an exception, the whole batch will be
failed, including the messages that have been processed already. This is
typically not the behaviour you want, and to solve this you have to delete or
put messages to the queue, keeping track of the failures and succesfully handled
messages yourself. This is not very obvious, and you can find some questions on
how to handle this properly &lt;a href=&quot;https://stackoverflow.com/questions/55497907/how-do-i-fail-a-specific-sqs-message-in-a-batch-from-a-lambda&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;https://stackoverflow.com/questions/56234199/splittling-sqs-lambda-batch-into-partial-success-partial-failure&quot;&gt;here.&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
AWS has introduced a new possibility for handling this, pretty recently, in
&lt;a href=&quot;https://aws.amazon.com/about-aws/whats-new/2021/11/aws-lambda-partial-batch-response-sqs-event-source/&quot;&gt;December 2021&lt;/a&gt;. If you include the failed messages in a lambda response called
&lt;code&gt;batchItemFailures&lt;/code&gt;, only those will be reposted to the queue (or the dead
letter queue). In &lt;code&gt;python&lt;/code&gt;, this looks like this. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #bc6ec5; font-weight: bold;&quot;&gt;lambda_handler&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;event: &lt;span style=&quot;color: #4f97d7;&quot;&gt;dict&lt;/span&gt;, context: &lt;span style=&quot;color: #4f97d7;&quot;&gt;dict&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt; -&amp;gt; &lt;span style=&quot;color: #4f97d7;&quot;&gt;dict&lt;/span&gt;:
    batch_item_failures = &lt;span style=&quot;color: #4f97d7;&quot;&gt;[]&lt;/span&gt;  &lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;list of things that failed&lt;/span&gt;

    &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;for&lt;/span&gt; msg &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;in&lt;/span&gt; event&lt;span style=&quot;color: #4f97d7;&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;Records&quot;&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;]&lt;/span&gt;:
        &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;try&lt;/span&gt;:
            handle_msg&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;msg&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
        &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;except&lt;/span&gt; &lt;span style=&quot;color: #ce537a; font-weight: bold;&quot;&gt;Exception&lt;/span&gt; &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;as&lt;/span&gt; e:  &lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;more specific is better&lt;/span&gt;
            batch_item_failures.append&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;msg&lt;span style=&quot;color: #bc6ec5;&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;messageId&quot;&lt;/span&gt;&lt;span style=&quot;color: #bc6ec5;&quot;&gt;]&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;

    &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #4f97d7;&quot;&gt;{&lt;/span&gt;
        &lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;batchItemFailures&quot;&lt;/span&gt;: batch_item_failures
    &lt;span style=&quot;color: #4f97d7;&quot;&gt;}&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;</content><author><name></name></author><category term="AWS" /><category term="SQS" /><summary type="html">When you have a lambda that is driven by a SQS queue, like this, your lambda can receive up to ten messages per batch. Your handler can look like this, handling all of the messages in a single event in a for loop.</summary></entry><entry><title type="html">Reading large tiles from S3 directly with `rasterio`</title><link href="/aws/geo/2022/05/06/rasterio-gdal-s3.html" rel="alternate" type="text/html" title="Reading large tiles from S3 directly with `rasterio`" /><published>2022-05-06T00:00:00+00:00</published><updated>2022-05-06T00:00:00+00:00</updated><id>/aws/geo/2022/05/06/rasterio-gdal-s3</id><content type="html" xml:base="/aws/geo/2022/05/06/rasterio-gdal-s3.html">&lt;p&gt;
At SpotR, we make heavy use of rasterdata containing gridded height measurements.
&lt;/p&gt;

&lt;p&gt;
When working in &lt;code&gt;python&lt;/code&gt;, the &lt;a href=&quot;https://rasterio.readthedocs.io/en/latest/index.html&quot;&gt;&lt;code&gt;rasterio&lt;/code&gt;&lt;/a&gt; package is useful. This package is
essentially a more pythonic binding to the GDAl library, as explained in their &lt;a href=&quot;https://rasterio.readthedocs.io/en/latest/intro.html&quot;&gt;
introduction&lt;/a&gt;. The file below was obtained from &lt;a href=&quot;https://data.gov.uk/dataset/f0db0249-f17b-4036-9e65-309148c97ce4/national-lidar-programme&quot;&gt;data.gov.uk&lt;/a&gt; and shows a 1x1km
patch of height measurements in the UK. The resolution is 1000x1000 pixels,
every pixel represents the (maximum) height of 1x1m. The lighter dots along the
top of the image are houses, there are some ragged parts where there is no data.
In the middle there is a depression, could be a riverbed, and then at the bottom
the terrain is rising a little.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;import&lt;/span&gt; rasterio
&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;import&lt;/span&gt; matplotlib.pyplot &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;as&lt;/span&gt; plt
&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;import&lt;/span&gt; numpy &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;as&lt;/span&gt; np

&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;with&lt;/span&gt; rasterio.&lt;span style=&quot;color: #4f97d7;&quot;&gt;open&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;/tmp/sd9863_DSM_1M.tiff&quot;&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span style=&quot;color: #7590db;&quot;&gt;dataset&lt;/span&gt;:
   heights = dataset.read&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;

&lt;span style=&quot;color: #7590db;&quot;&gt;fn&lt;/span&gt; = &lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;images/heights.png&quot;&lt;/span&gt;

&lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #2aa1ae; background-color: #292e34;&quot;&gt;replace nodata values with nan&lt;/span&gt;
&lt;span style=&quot;color: #7590db;&quot;&gt;heights&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;[&lt;/span&gt;np.where&lt;span style=&quot;color: #bc6ec5;&quot;&gt;(&lt;/span&gt;heights==dataset.nodata&lt;span style=&quot;color: #bc6ec5;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;]&lt;/span&gt; = np.nan

plt.imshow&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;heights&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
plt.title&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;Example of a 1km x 1km raster&quot;&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
plt.tight_layout&lt;span style=&quot;color: #4f97d7;&quot;&gt;()&lt;/span&gt;
plt.savefig&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;fn&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
fn
&lt;/pre&gt;
&lt;/div&gt;

&lt;div id=&quot;org18e4de7&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/heights.png&quot; alt=&quot;heights.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
Rastertiles, typically GeoTiff files, can become quite large in terms of memory
size. This grid above takes up \~4Mb as an uncompressed GeoTiff file, down from
6.5Mb as a &lt;code&gt;.asc&lt;/code&gt; file, which is a simple text-based format. There are a couple
of interesting compression techniques like &lt;a href=&quot;https://en.wikipedia.org/wiki/Deflate&quot;&gt;DEFLATE&lt;/a&gt; and &lt;a href=&quot;https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch&quot;&gt;LZW&lt;/a&gt; that can bring the
size of the data down further. It is possible to convert rasters with
&lt;code&gt;rasterio&lt;/code&gt;, but the &lt;code&gt;gdal_translate&lt;/code&gt; utility is the tool for the job.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;gdal_translate /tmp/sd9863_DSM_1M.asc /tmp/sd9863_DSM_1M.tiff &amp;gt; /dev/null
gdal_translate /tmp/sd9863_DSM_1M.asc /tmp/sd9863_DSM_1M_lzw.tiff -co &lt;span style=&quot;color: #7590db;&quot;&gt;COMPRESS&lt;/span&gt;=LZW &amp;gt; /dev/null
gdal_translate /tmp/sd9863_DSM_1M.asc /tmp/sd9863_DSM_1M_def.tiff -co &lt;span style=&quot;color: #7590db;&quot;&gt;COMPRESS&lt;/span&gt;=DEFLATE &amp;gt; /dev/null
gdal_translate /tmp/sd9863_DSM_1M.asc /tmp/sd9863_DSM_1M_def_pred.tiff -co &lt;span style=&quot;color: #7590db;&quot;&gt;COMPRESS&lt;/span&gt;=DEFLATE -co &lt;span style=&quot;color: #7590db;&quot;&gt;PREDICTOR&lt;/span&gt;=&lt;span style=&quot;color: #a45bad;&quot;&gt;2&lt;/span&gt; &amp;gt; /dev/null
ls -lha /tmp/sd*
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;-rw-rw-r-- &lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt; gijs gijs &lt;span style=&quot;color: #a45bad;&quot;&gt;6,5M&lt;/span&gt; jun &lt;span style=&quot;color: #a45bad;&quot;&gt;13&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2018&lt;/span&gt; /tmp/sd9863_DSM_1M.asc
-rw-rw-r-- &lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt; gijs gijs &lt;span style=&quot;color: #a45bad;&quot;&gt;1,1M&lt;/span&gt; mei  &lt;span style=&quot;color: #a45bad;&quot;&gt;9&lt;/span&gt; &lt;span style=&quot;color: #a45bad;&quot;&gt;09:11&lt;/span&gt; /tmp/sd9863_DSM_1M_def_pred.tiff
-rw-rw-r-- &lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt; gijs gijs &lt;span style=&quot;color: #a45bad;&quot;&gt;1,5M&lt;/span&gt; mei  &lt;span style=&quot;color: #a45bad;&quot;&gt;9&lt;/span&gt; &lt;span style=&quot;color: #a45bad;&quot;&gt;09:11&lt;/span&gt; /tmp/sd9863_DSM_1M_def.tiff
-rw-rw-r-- &lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt; gijs gijs &lt;span style=&quot;color: #a45bad;&quot;&gt;1,8M&lt;/span&gt; mei  &lt;span style=&quot;color: #a45bad;&quot;&gt;9&lt;/span&gt; &lt;span style=&quot;color: #a45bad;&quot;&gt;09:11&lt;/span&gt; /tmp/sd9863_DSM_1M_lzw.tiff
-rw-rw-r-- &lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt; gijs gijs &lt;span style=&quot;color: #a45bad;&quot;&gt;3,9M&lt;/span&gt; mei  &lt;span style=&quot;color: #a45bad;&quot;&gt;9&lt;/span&gt; &lt;span style=&quot;color: #a45bad;&quot;&gt;09:11&lt;/span&gt; /tmp/sd9863_DSM_1M.tiff
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Interestingly, all compression techniques available in &lt;code&gt;GDAL&lt;/code&gt; are lossless.
There are JPEG based compression systems, but they can only be applied to 8bit
unsigned data, in other words, images, and these height measurements which are
organized as floating point numbers cannot be stored using JPEG compression. I
can definitely think of some usecases where some distortion of these
measurements is fine, as long as it's bounded somehow, but I haven't come across
examples of a lossy compression for rasters of floating points.
&lt;/p&gt;

&lt;div id=&quot;outline-container-org43128ac&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org43128ac&quot;&gt;Partial reads&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org43128ac&quot;&gt;
&lt;p&gt;
Compression can save us almost an order of magnitude, but to store this data at
our scale, things still add up. I live in the Netherlands which has an area of
41,543 km&lt;sup&gt;2&lt;/sup&gt;. That's 40k+ tiles at 1Mb+ each, 50Gb in total. Perfect to save on
cloud storage such as S3.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;aws s3 ls s3://heights-tiles/tiles/sd980
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2903641&lt;/span&gt;  sd9800_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:54&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2871755&lt;/span&gt;  sd9801_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:54&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2938302&lt;/span&gt;  sd9802_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2719476&lt;/span&gt;  sd9803_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2643684&lt;/span&gt;  sd9804_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2533681&lt;/span&gt;  sd9805_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2715498&lt;/span&gt;  sd9806_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2818095&lt;/span&gt;  sd9807_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:55&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;2755601&lt;/span&gt;  sd9808_DSM_1M.tiff 
&lt;span style=&quot;color: #a45bad;&quot;&gt;2022-04-29&lt;/span&gt;  &lt;span style=&quot;color: #a45bad;&quot;&gt;23:08:56&lt;/span&gt;   &lt;span style=&quot;color: #a45bad;&quot;&gt;468739&lt;/span&gt;  sd9809_DSM_1M.tiff 
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
When doing a calculation, we're typically not interested in the whole of the
tile. For example, we only want to know the height of a single pixel in the
raster file. It is possible to avoid downloading the whole file, this operation
can be done using a partial read. This is possible because S3 allows
random-access reads, and GDAL supports reading over a network with
&lt;a href=&quot;https://gdal.org/user/virtual_file_systems.html&quot;&gt;virtual file systems&lt;/a&gt;.
&lt;/p&gt;

&lt;p&gt;
Depending on how large your tiles are, this can make a big difference. Let's
benchmark this.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;import&lt;/span&gt; rasterio
&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;from&lt;/span&gt; rasterio.windows &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;import&lt;/span&gt; Window

&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;with&lt;/span&gt; rasterio.&lt;span style=&quot;color: #4f97d7;&quot;&gt;open&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #2d9574;&quot;&gt;&quot;s3://heights-tiles/tiles/sd9800_DSM_1M.tiff&quot;&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;as&lt;/span&gt; &lt;span style=&quot;color: #7590db;&quot;&gt;raster&lt;/span&gt;:
  dt = raster.read&lt;span style=&quot;color: #4f97d7;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #a45bad;&quot;&gt;1&lt;/span&gt;, window=Window&lt;span style=&quot;color: #bc6ec5;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #a45bad;&quot;&gt;500&lt;/span&gt;, &lt;span style=&quot;color: #a45bad;&quot;&gt;500&lt;/span&gt;, &lt;span style=&quot;color: #a45bad;&quot;&gt;501&lt;/span&gt;, &lt;span style=&quot;color: #a45bad;&quot;&gt;501&lt;/span&gt;&lt;span style=&quot;color: #bc6ec5;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #4f97d7;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;time&lt;/span&gt; python src/read_raster_window.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;real    &lt;span style=&quot;color: #a45bad;&quot;&gt;0m17,300s&lt;/span&gt;
user    &lt;span style=&quot;color: #a45bad;&quot;&gt;0m3,026s&lt;/span&gt;
sys     &lt;span style=&quot;color: #a45bad;&quot;&gt;0m1,038s&lt;/span&gt;                                        
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Wait a minute .. 17 seconds is still a long time. It turns out that &lt;code&gt;GDAL&lt;/code&gt; will
scan the whole folder for other files before opening a file. This is interesting
behaviour that makes sense when geodata files are often accompanied by other
files that include information about transformation, possibly some indexes and
more. We can disable this behaviour by setting an environment value. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;&lt;span style=&quot;color: #4f97d7; font-weight: bold;&quot;&gt;time&lt;/span&gt; &lt;span style=&quot;color: #7590db;&quot;&gt;GDAL_DISABLE_READDIR_ON_OPEN&lt;/span&gt;=YES python src/read_raster_window.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;real    &lt;span style=&quot;color: #a45bad;&quot;&gt;0m1,230s&lt;/span&gt;
user    &lt;span style=&quot;color: #a45bad;&quot;&gt;0m0,400s&lt;/span&gt;
sys     &lt;span style=&quot;color: #a45bad;&quot;&gt;0m0,948s&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><category term="AWS" /><category term="geo" /><summary type="html">At SpotR, we make heavy use of rasterdata containing gridded height measurements.</summary></entry><entry><title type="html">Adjusting age group for local vaccination rate (2)</title><link href="/julia/statistics/covid/2021/10/09/covid-correction.html" rel="alternate" type="text/html" title="Adjusting age group for local vaccination rate (2)" /><published>2021-10-09T00:00:00+00:00</published><updated>2021-10-09T00:00:00+00:00</updated><id>/julia/statistics/covid/2021/10/09/covid-correction</id><content type="html" xml:base="/julia/statistics/covid/2021/10/09/covid-correction.html">&lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;&lt;/script&gt;

&lt;div id=&quot;outline-container-org7485290&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org7485290&quot;&gt;Adjusting percentages for local vaccination rate&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org7485290&quot;&gt;
&lt;p&gt;
This is an answer to a &lt;a href=&quot;https://stats.stackexchange.com/questions/546774/how-to-combine-state-level-covid-19-vaccination-rates-with-national-demographic&quot;&gt;question&lt;/a&gt; on Stats Overflow. 
&lt;/p&gt;

&lt;p&gt;
I want to estimate the probability of a person aged 40-49 in Delaware
to be vaccinated, but I only have nationwide statistics on vaccination
levels by age, and a level of vaccination in Delaware, but no age
breakdown for that state. So the question is, how can we combine those
percentages, for the agegroup and Delaware, into a specific percentage
for that agegroup in Delaware?
&lt;/p&gt;

&lt;p&gt;
I tried doing that in my
&lt;/p&gt;
&lt;a href=&quot;/julia/statistics/covid/2021/10/07/covid-logit-addition.html&quot;&gt;previous&lt;/a&gt;
&lt;p&gt;
post, but, I came up with a more straightforward method.
&lt;/p&gt;

&lt;p&gt;
To begin, from the &lt;a href=&quot;https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total&quot;&gt;official statistics&lt;/a&gt;, we get the percentage of
people vaccinated in Delaware, which is 56.6%. Let \(D\) be the total
population of Delaware. Then there are \(0.566 \cdot D\) vaccinated
persons in Delaware.
&lt;/p&gt;

&lt;p&gt;
The number of people in the US in the age group 40-49 is 12.2%. But
they make up 14.2% percent of the people vaccinated. Let's assume
these percentages hold in Delaware as well.
&lt;/p&gt;

&lt;p&gt;
Then the total number of people aged 40-49 living in Delaware is
\(0.122\cdot D\). And the number of people vaccinated aged between 40-49
is 14.2% of vaccinated subjects. So the final percentage is
&lt;/p&gt;

&lt;p&gt;
\[
\frac{0.142 \cdot 0.566 \cdot D}{0.122 \cdot D} = \frac{.142 \cdot .566}{0.122} \approx 65.9\%
\]
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><category term="julia" /><category term="statistics" /><category term="covid" /><summary type="html"></summary></entry><entry><title type="html">Adding log odds to combine statistics</title><link href="/julia/statistics/covid/2021/10/07/covid-logit-addition.html" rel="alternate" type="text/html" title="Adding log odds to combine statistics" /><published>2021-10-07T00:00:00+00:00</published><updated>2021-10-07T00:00:00+00:00</updated><id>/julia/statistics/covid/2021/10/07/covid-logit-addition</id><content type="html" xml:base="/julia/statistics/covid/2021/10/07/covid-logit-addition.html">&lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;&lt;/script&gt;

&lt;div id=&quot;outline-container-org3c385eb&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org3c385eb&quot;&gt;Adding log odds to combine statistics&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org3c385eb&quot;&gt;
&lt;p&gt;
This is an answer to a &lt;a href=&quot;https://stats.stackexchange.com/questions/546774/how-to-combine-state-level-covid-19-vaccination-rates-with-national-demographic&quot;&gt;question&lt;/a&gt; on Stats Overflow. 
&lt;/p&gt;

&lt;p&gt;
I want to estimate the probability of a person aged 40-49 in Delaware
to be vaccinated, but I only have nationwide statistics on
vaccination levels by age, and a level of vaccination in Delaware, but
no age breakdown for that state.
&lt;/p&gt;

&lt;p&gt;
I will need to make some independence assumptions, notably, that the
age distribution of vaccinations is the same in Delaware. See below
for another assumption I have to make to work with the provided data.
&lt;/p&gt;

&lt;p&gt;
The method I use is to manually calculate the coefficients in a
logistic regression model. As you will see, what happens is that we
cannot add and subtract percentages directly, but we can add and
subtract logodds.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-org6a3af3a&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;org6a3af3a&quot;&gt;Logistic regression model&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-org6a3af3a&quot;&gt;
&lt;p&gt;
To begin, we need two percentages from the &lt;a href=&quot;https://covid.cdc.gov/covid-data-tracker/#vaccinations_vacc-total-admin-rate-total&quot;&gt;official statistics&lt;/a&gt;, the
nationwide (full) vaccination grade, and the percentage in Delaware.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;us_vacc_p = .561
del_vacc_p = .566
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.566
&lt;/pre&gt;


&lt;p&gt;
I'm going to be using the following functions. The programming language
I'm using is Julia, but I'm using only two basic functions and
assignments so the code is going to be pretty much the same as in
Python or R. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;function logit(p)
     log(p / (1 - p))
end

function logistic(x)
    exp(x) / (exp(x) + 1)
end
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The &lt;code&gt;logit&lt;/code&gt; function calculates the so called log odds of a probability. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;us_vacc_logodds = logit(us_vacc_p)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.24522149244752528
&lt;/pre&gt;


&lt;p&gt;
The &lt;code&gt;logistic&lt;/code&gt; function inverts this operation. A logistic regression
model for this looks like
&lt;/p&gt;

&lt;p&gt;
\[
\text{logit}(p) = \text{base}
\]
&lt;/p&gt;

&lt;p&gt;
for the general population, and
&lt;/p&gt;

&lt;p&gt;
\[
\text{logit}(p) = \text{base} + \text{coefficient for Delaware}
\]
&lt;/p&gt;

&lt;p&gt;
for persons living in Delaware, where \(p\) is the probability of that
person being vaccinated. Because the logistic function is the inverse
of the logit function, we can calculate \(p\), the probability we are
after, with the formula
&lt;/p&gt;

&lt;p&gt;
\[
p = \text{logistic}\left(\text{base} + \text{coefficient for Delaware}\right)
\]
&lt;/p&gt;

&lt;p&gt;
Now the trick is that we can manually calculate the coefficient for
Delaware using the following formula. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;del_vacc_coef = logit(del_vacc_p) - logit(us_vacc_p)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.020328051655252644
&lt;/pre&gt;


&lt;p&gt;
To check this, let's use this model to calculate the vaccination
probability of the general us population,
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;logistic(us_vacc_logodds)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.561
&lt;/pre&gt;


&lt;p&gt;
and for Delaware we use the coefficient as well, and we get
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;logistic(us_vacc_logodds + del_vacc_coef)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.566
&lt;/pre&gt;


&lt;p&gt;
Now the next step is to calculate the coefficient for the age group,
and add that to our model as well. 
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orgd901c70&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orgd901c70&quot;&gt;Calculating the age coefficient for 40-49&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orgd901c70&quot;&gt;
&lt;p&gt;
The official &lt;a href=&quot;https://covid.cdc.gov/covid-data-tracker/#vaccination-demographic&quot;&gt;statistics&lt;/a&gt; aren't yet in the form we need them. On the
graphs, you can find that 14.1% of those vaccinated are in the age
group 40-49. What we want to know is how many in this age group are
vaccinated. A complication here is that only 91% of those vaccinated
have reported their age. We need another assumption here, namely that
this nonresponse is independent from age group. If we assume that, we
know that 14.1% of all vaccinated are in the age group 40-49.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;vac_n = 186387228           # total number of vaccinated
age_vacc_n = .142 * vac_n   # in the age group 40-49
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
26466986.376
&lt;/pre&gt;


&lt;p&gt;
Also, we need the total number of people in the US in this age group,
which isn't listed directly either. From the graph, it's 12.2% of the
total population. The total population isn't listed either, but, 56.1%
of the population is vaccinated, so we can calculated the total
population from that.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;us_n = vac_n / .561
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
332241048.1283422
&lt;/pre&gt;


&lt;p&gt;
So the percentage vaccinated in the age group 40-49 is
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;age_n = .122 * us_n
age_p = age_vacc_n / age_n
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.6529672131147541
&lt;/pre&gt;


&lt;p&gt;
Converting this to log odds, the calculation of the coefficient for the age
group 40-49 is the same as earlier for Delaware
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;age_vacc_coef = logit(age_p) - logit(us_vacc_p)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.38688616375890655
&lt;/pre&gt;


&lt;p&gt;
In the final calculation I combine the age based coefficient to the
coefficient for Delaware. This is the step where I need the assumption
that the age distribution is the same in Delaware. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;logistic(us_vacc_logodds + del_vacc_coef + age_vacc_coef)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
0.6575591337731559
&lt;/pre&gt;


&lt;p&gt;
So with the listed assumptions I estimate the probability of a person
aged 40-49 living in Delaware to be vaccinated at 65.7%.
&lt;/p&gt;

&lt;p&gt;
It is interesting to compare this to the original probabilities, with
65.3% of this age group being vaccinated in general, which is then
corrected by comparing the 56.6% Delaware population average to the
56.1% general us population average.
&lt;/p&gt;

&lt;p&gt;
Thanks for reading! If you want to reach out, post an issue to the
&lt;a href=&quot;https://github.com/Gijs-Koot/Gijs-Koot.github.io&quot;&gt;Github repository of this website&lt;/a&gt; or contact me on Twitter!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><category term="julia" /><category term="statistics" /><category term="covid" /><summary type="html"></summary></entry><entry><title type="html">Using Futures and the ProcessPoolExecutor in python</title><link href="/python/multiprocessing/2021/09/16/python-multiprocessing-pool.html" rel="alternate" type="text/html" title="Using Futures and the ProcessPoolExecutor in python" /><published>2021-09-16T00:00:00+00:00</published><updated>2021-09-16T00:00:00+00:00</updated><id>/python/multiprocessing/2021/09/16/python-multiprocessing-pool</id><content type="html" xml:base="/python/multiprocessing/2021/09/16/python-multiprocessing-pool.html">&lt;div id=&quot;outline-container-org0804d4f&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org0804d4f&quot;&gt;When to use &lt;code&gt;ProcessPoolExecutor&lt;/code&gt;&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org0804d4f&quot;&gt;
&lt;p&gt;
Using the &lt;code&gt;ProcessPoolExecutor&lt;/code&gt; in &lt;code&gt;concurrent.futures&lt;/code&gt; is a quick way
to divide your workload over multiple processes. This is useful if you
have a couple of tasks that you want to run in parallel to save
time. Compared to the &lt;code&gt;ThreadPoolExecutor&lt;/code&gt;, the process pool is a bit
more primitive, basically, the whole process is forked into multiple
copies that each do their own business, with the
&lt;code&gt;concurrent.futures.ProcessPoolExecutor&lt;/code&gt; taking care of cleaning up
and basic communication between the tasks.
&lt;/p&gt;

&lt;table border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;


&lt;colgroup&gt;
&lt;col class=&quot;org-left&quot; /&gt;

&lt;col class=&quot;org-left&quot; /&gt;

&lt;col class=&quot;org-left&quot; /&gt;
&lt;/colgroup&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th scope=&quot;col&quot; class=&quot;org-left&quot;&gt;&amp;#xa0;&lt;/th&gt;
&lt;th scope=&quot;col&quot; class=&quot;org-left&quot;&gt;&lt;code&gt;ProcessPoolExecutor&lt;/code&gt;&lt;/th&gt;
&lt;th scope=&quot;col&quot; class=&quot;org-left&quot;&gt;&lt;code&gt;ThreadPoolExecutor&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;org-left&quot;&gt;strength&lt;/td&gt;
&lt;td class=&quot;org-left&quot;&gt;Parallel CPU bound tasks&lt;/td&gt;
&lt;td class=&quot;org-left&quot;&gt;Parellel IO bound tasks&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class=&quot;org-left&quot;&gt;weakness&lt;/td&gt;
&lt;td class=&quot;org-left&quot;&gt;Memory usage&lt;/td&gt;
&lt;td class=&quot;org-left&quot;&gt;Limited to single CPU due to GIL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;
You should use the &lt;code&gt;ProcessPoolExecutor&lt;/code&gt; over the &lt;code&gt;ThreadPoolExecutor&lt;/code&gt;
if your tasks are CPU bound. The weakness of copying the process is
that you also copy it's memory which may add up, there is a bit more
overhead when compared to splitting into threads, but, the big
advantage is that multiple processes each can use up to 100% of a
single CPU core, while, due to the limitations of the Global
Interpreter Lock, multiple threads will not saturate multiple
CPU's. There's a couple of holes in this simplified model, for
example, python code can sometimes release the GIL, notably &lt;code&gt;numpy&lt;/code&gt;
code, in which case threads can also effectively use multiple
GPU's. But for now, let's not worry about those details too much and
learn how to use multiprocessing easily in python.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orgecc74a1&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orgecc74a1&quot;&gt;Basic multiprocessing with &lt;code&gt;os.fork&lt;/code&gt;&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orgecc74a1&quot;&gt;
&lt;p&gt;
First, to get started, have a look at this demonstration of
&lt;code&gt;os.fork&lt;/code&gt;. In practical terms, this duplicates the running
process. Two copies of the same program will run, basically identical,
except for their &lt;code&gt;pid&lt;/code&gt;, their process id.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/fork.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; os

os.fork&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;

&lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;os.getpid&lt;span style=&quot;color: #ff62d4;&quot;&gt;()&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python3 ./src/fork.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;18359&lt;/span&gt;
&lt;span style=&quot;color: #00bcff;&quot;&gt;18360&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
It is definitely possible to write some multiprocessing code directly
on this primitive system.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/fork_mp.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; os
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; time
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; sys

&lt;span style=&quot;color: #00d3d0;&quot;&gt;tasks&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;list&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #f78fe7;&quot;&gt;range&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;

&lt;span style=&quot;color: #00d3d0;&quot;&gt;part_a&lt;/span&gt; = tasks&lt;span style=&quot;color: #ffffff;&quot;&gt;[&lt;/span&gt;:&lt;span style=&quot;color: #00bcff;&quot;&gt;5&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;]&lt;/span&gt;
&lt;span style=&quot;color: #00d3d0;&quot;&gt;part_b&lt;/span&gt; = tasks&lt;span style=&quot;color: #ffffff;&quot;&gt;[&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;5&lt;/span&gt;:&lt;span style=&quot;color: #ffffff;&quot;&gt;]&lt;/span&gt;

&lt;span style=&quot;color: #00d3d0;&quot;&gt;res&lt;/span&gt; = os.fork&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;

&lt;span style=&quot;color: #b6a0ff;&quot;&gt;if&lt;/span&gt; res == &lt;span style=&quot;color: #00bcff;&quot;&gt;0&lt;/span&gt;:
    &lt;span style=&quot;color: #a8a8a8;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #a8a8a8;&quot;&gt;main process&lt;/span&gt;
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; task &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; part_a:
        &lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;I am process &lt;/span&gt;{os.getpid()}&lt;span style=&quot;color: #79a8ff;&quot;&gt; working on task &lt;/span&gt;{task}&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
        time.sleep&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;.&lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
        sys.stdout.flush&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;else&lt;/span&gt;:
    &lt;span style=&quot;color: #a8a8a8;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #a8a8a8;&quot;&gt;child process&lt;/span&gt;
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; task &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; part_b:
        &lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;I am process &lt;/span&gt;{os.getpid()}&lt;span style=&quot;color: #79a8ff;&quot;&gt; working on task &lt;/span&gt;{task}&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
        time.sleep&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;.&lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
        sys.stdout.flush&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;

&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This program divides the tasks between the two processes. I added a
&lt;code&gt;time.sleep(0.2)&lt;/code&gt; for every task, so the tasks in total take two
seconds. The script however takes approximately 1 second to run,
saving exactly one second over running the tasks in a single process.
&lt;/p&gt;

&lt;p&gt;
We use the output of &lt;code&gt;os.fork&lt;/code&gt; to determine which of the processes we
are, if the result is 0, we are the main process, and if the result is
different, we know we are in the child process.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python3 ./src/fork_mp.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18377&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;0&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18376&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;5&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18377&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;1&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18376&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;6&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18377&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18376&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;7&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18376&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;8&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18377&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;3&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18376&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;9&lt;/span&gt;
I am process &lt;span style=&quot;color: #00bcff;&quot;&gt;18377&lt;/span&gt; working on task &lt;span style=&quot;color: #00bcff;&quot;&gt;4&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
These examples show at a lower level than the &lt;code&gt;ProcessPoolExecutor&lt;/code&gt;
how multiprocessing works. However, if you want to extend the latter
approach into working code that deals with failures, passes the tasks
to the processes consistently and also collects the results, you can
see it'll get quite a bit more complicated. Enter the
&lt;code&gt;ProcessPoolExecutor&lt;/code&gt;, which does all those hard things for you! 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/ppool_demo.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;from&lt;/span&gt; concurrent.futures &lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; ProcessPoolExecutor
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; time
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; os

&lt;span style=&quot;color: #00d3d0;&quot;&gt;tasks&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;range&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #00d3d0;&quot;&gt;start&lt;/span&gt; = time.time&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;do_work&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;task&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;:
    &lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;I am process &lt;/span&gt;{os.getpid()}&lt;span style=&quot;color: #79a8ff;&quot;&gt; working on task &lt;/span&gt;{task}&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
    time.sleep&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;.&lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;with&lt;/span&gt; ProcessPoolExecutor&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;max_workers=&lt;span style=&quot;color: #00bcff;&quot;&gt;4&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #b6a0ff;&quot;&gt;as&lt;/span&gt; pool:
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; task &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; tasks:
        pool.submit&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;do_work, task&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;

&lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;Main process done after &lt;/span&gt;{time.time() - start:.2f}&lt;span style=&quot;color: #79a8ff;&quot;&gt;s&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
In only three lines you can start 4 workers that divide the work as
evenly as possible.
&lt;/p&gt;

&lt;ul class=&quot;org-ul&quot;&gt;
&lt;li&gt;&lt;code&gt;pool.submit&lt;/code&gt; sends a task to one of the workers&lt;/li&gt;
&lt;li&gt;The context manager (&lt;code&gt;with&lt;/code&gt; block) waits until all the workers are
done before proceeding&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
This example takes 0.6 seconds. We have four workers, 10 tasks, so
some workers get 2 tasks and some get 3 tasks, and the workers with 3
tasks take 3 * 0.2 seconds to finish.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python ./src/ppool_demo.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Note that in the output, the processes seem ordered. However, this is
a subtle effect of buffering of the &lt;code&gt;stdout&lt;/code&gt;. Each process has its own
buffer and they flush all their output in one go, making it look like
they run in succession. You can surpress this behaviour by manually
triggering the flushes as I showed earlier with &lt;code&gt;sys.stdout.flush()&lt;/code&gt;.
&lt;/p&gt;

&lt;p&gt;
Here, it is crucial that &lt;code&gt;pool.submit&lt;/code&gt; is non-blocking, if you call
the function the main process doesn't wait until the worker is
done. This allows us to schedule all the work to the workers quickly.
&lt;/p&gt;

&lt;p&gt;
There are three things I want to explain in this post
&lt;/p&gt;

&lt;ul class=&quot;org-ul&quot;&gt;
&lt;li&gt;How you can collect return values of the tasks (using
&lt;code&gt;concurrent.futures.Future&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;What happens if workers run into an exception and how you can deal
with it&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-org6424cf4&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;org6424cf4&quot;&gt;Collecting return values&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-org6424cf4&quot;&gt;
&lt;p&gt;
In the examples above, I simply fired off the tasks and showed that
they were doing something by printing statements to &lt;a href=&quot;https://en.wikipedia.org/wiki/Standard_streams#Standard_output_(stdout)&quot;&gt;stdout&lt;/a&gt;. But in a
practical situation, you typically want to collect the results of the
work. To do that with a &lt;code&gt;ProcessPoolExecutor&lt;/code&gt;, you will need to deal
with &lt;code&gt;concurrent.futures.Future&lt;/code&gt;. If you have experience with
Javascript for example, dealing with futures is very common, but you
can write quite a bit of python code without encountering these.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;from&lt;/span&gt; concurrent.futures &lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; ProcessPoolExecutor

&lt;span style=&quot;color: #b6a0ff;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;work&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;word&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;:
    &lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;word&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;return&lt;/span&gt; &lt;span style=&quot;color: #f78fe7;&quot;&gt;len&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;word&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;

&lt;span style=&quot;color: #b6a0ff;&quot;&gt;with&lt;/span&gt; ProcessPoolExecutor&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;max_workers=&lt;span style=&quot;color: #00bcff;&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #b6a0ff;&quot;&gt;as&lt;/span&gt; pool:
    result = pool.submit&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;work, &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;hello&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
    &lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;result&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
The result of a &lt;code&gt;pool.submit&lt;/code&gt; is an instance of &lt;a href=&quot;https://docs.python.org/3/library/concurrent.futures.html#future-objects&quot;&gt;&lt;code&gt;Future&lt;/code&gt;&lt;/a&gt;. A &lt;code&gt;Future&lt;/code&gt; is a
reference to work in progress. Its most fundamental method is
&lt;code&gt;Future.result&lt;/code&gt;. From the official documentation
&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;
Return the value returned by the call. If the call hasn’t yet
completed then this method will wait up to timeout seconds. If the
call hasn’t completed in timeout seconds, then a
concurrent.futures.TimeoutError will be raised. timeout can be an int
or float. If timeout is not specified or None, there is no limit to
the wait time.
&lt;/p&gt;

&lt;p&gt;
If the future is cancelled before completing then CancelledError will
be raised.
&lt;/p&gt;

&lt;p&gt;
If the call raised an exception, this method will raise the same
exception.
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;
It is important to understand that this method is &lt;b&gt;blocking&lt;/b&gt;, as
opposed to the &lt;code&gt;pool.submit&lt;/code&gt; method I used earlier. A mistake I have
seen often is the following:
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/block_mistake.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;from&lt;/span&gt; concurrent.futures &lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; ProcessPoolExecutor
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; time

&lt;span style=&quot;color: #00d3d0;&quot;&gt;tasks&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;range&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #00d3d0;&quot;&gt;results&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;list&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;
&lt;span style=&quot;color: #00d3d0;&quot;&gt;start&lt;/span&gt; = time.time&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;do_work&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;task&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;:
    time.sleep&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;0&lt;/span&gt;.&lt;span style=&quot;color: #00bcff;&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;return&lt;/span&gt; task ** &lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;with&lt;/span&gt; ProcessPoolExecutor&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;max_workers=&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #b6a0ff;&quot;&gt;as&lt;/span&gt; pool:
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; task &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; tasks:
        future = pool.submit&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;do_work, task&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
        results.append&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;future.result&lt;span style=&quot;color: #ff62d4;&quot;&gt;()&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;  &lt;span style=&quot;color: #a8a8a8;&quot;&gt;# &lt;/span&gt;&lt;span style=&quot;color: #a8a8a8;&quot;&gt;collect results&lt;/span&gt;

&lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;Done after &lt;/span&gt;{time.time() - start}&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Can you spot the mistake? The problem is that before scheduling the
next task, the main process waits the result of the task just
scheduled. This script takes 1 second to run, because in effect, all
tasks are run in succession.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python ./src/block_mistake.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Instead, the results should be collected after the process pool
context is done scheduling the tasks.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/block_fixed.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;from&lt;/span&gt; concurrent.futures &lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; ProcessPoolExecutor
&lt;span style=&quot;color: #b6a0ff;&quot;&gt;import&lt;/span&gt; time

&lt;span style=&quot;color: #00d3d0;&quot;&gt;start&lt;/span&gt; = time.time&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;

&lt;span style=&quot;color: #00d3d0;&quot;&gt;tasks&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;range&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #00d3d0;&quot;&gt;futures&lt;/span&gt; = &lt;span style=&quot;color: #f78fe7;&quot;&gt;list&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;()&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;def&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;do_work&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;task&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;:
    time.sleep&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #00bcff;&quot;&gt;0&lt;/span&gt;.&lt;span style=&quot;color: #00bcff;&quot;&gt;1&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;return&lt;/span&gt; task ** &lt;span style=&quot;color: #00bcff;&quot;&gt;2&lt;/span&gt;


&lt;span style=&quot;color: #b6a0ff;&quot;&gt;with&lt;/span&gt; ProcessPoolExecutor&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;max_workers=&lt;span style=&quot;color: #00bcff;&quot;&gt;10&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #b6a0ff;&quot;&gt;as&lt;/span&gt; pool:
    &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; task &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; tasks:
        futures.append&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;pool.submit&lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;do_work, task&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;

results = &lt;span style=&quot;color: #ffffff;&quot;&gt;[&lt;/span&gt;future.result&lt;span style=&quot;color: #ff62d4;&quot;&gt;()&lt;/span&gt; &lt;span style=&quot;color: #b6a0ff;&quot;&gt;for&lt;/span&gt; future &lt;span style=&quot;color: #b6a0ff;&quot;&gt;in&lt;/span&gt; futures&lt;span style=&quot;color: #ffffff;&quot;&gt;]&lt;/span&gt;

&lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;results&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #f78fe7;&quot;&gt;print&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;f&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;Done after &lt;/span&gt;{time.time() - start}&lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python ./src/block_fixed.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-python&quot;&gt;
&lt;/pre&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-org1d86d84&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;org1d86d84&quot;&gt;Dealing with exceptions in child processes&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-org1d86d84&quot;&gt;
&lt;p&gt;
If a child process raises a unhandled &lt;code&gt;Exception&lt;/code&gt;, this exception is
passed to the main process when calling &lt;code&gt;Future.result&lt;/code&gt;. In the
example below, you can see how to catch those errors when unpacking
the results of the process workers.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;cat ./src/error_example.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
from concurrent.futures import ProcessPoolExecutor
from random import random

tasks = range(10)
futures = list()


def do_work(task):
    if random() &amp;gt; .5:
        return task ** 2
    else:
        raise Exception(&quot;OW!&quot;)


with ProcessPoolExecutor(max_workers=10) as pool:
    for task in tasks:
        futures.append(pool.submit(do_work, task))

results = list()

for future in futures:
    try:
        results.append(future.result())
    except Exception as e:
        results.append(f&quot;Failed with {e}!&quot;)

print(results)
&lt;/pre&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-bash&quot;&gt;python ./src/error_example.py
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
[0, 1, 4, 9, 16, 'Failed with OW!!', 'Failed with OW!!', 'Failed with OW!!', 64, 81]
&lt;/pre&gt;

&lt;p&gt;
Thanks for reading! If you want to reach out, post an issue to the
&lt;a href=&quot;https://github.com/Gijs-Koot/Gijs-Koot.github.io&quot;&gt;Github repository of this website&lt;/a&gt; or contact me on Twitter!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><category term="python" /><category term="multiprocessing" /><summary type="html">When to use ProcessPoolExecutor Using the ProcessPoolExecutor in concurrent.futures is a quick way to divide your workload over multiple processes. This is useful if you have a couple of tasks that you want to run in parallel to save time. Compared to the ThreadPoolExecutor, the process pool is a bit more primitive, basically, the whole process is forked into multiple copies that each do their own business, with the concurrent.futures.ProcessPoolExecutor taking care of cleaning up and basic communication between the tasks.</summary></entry><entry><title type="html">Pommodoro timer in Elisp</title><link href="/elisp,/emacs,/time/2021/09/09/emacs-pommodoro.html" rel="alternate" type="text/html" title="Pommodoro timer in Elisp" /><published>2021-09-09T00:00:00+00:00</published><updated>2021-09-09T00:00:00+00:00</updated><id>/elisp,/emacs,/time/2021/09/09/emacs-pommodoro</id><content type="html" xml:base="/elisp,/emacs,/time/2021/09/09/emacs-pommodoro.html">&lt;p&gt;
I wrote a &lt;a href=&quot;https://todoist.com/productivity-methods/pomodoro-technique&quot;&gt;pommodoro&lt;/a&gt; timer in &lt;a href=&quot;https://www.emacswiki.org/emacs/LearnEmacsLisp&quot;&gt;elisp&lt;/a&gt;. Elisp is the language of Emacs,
the 40 year old editor that is still going! Elisp is to Emacs what
Javascript is to Visual Studio Code if that means more to you ;).
&lt;/p&gt;

&lt;p&gt;
The pommodoro technique is new to me. It is a time management
technique that consists of five simple steps. 
&lt;/p&gt;

&lt;ol class=&quot;org-ol&quot;&gt;
&lt;li&gt;Get a to-do list and a timer.&lt;/li&gt;
&lt;li&gt;Set your timer for 25 minutes, and focus on a single task until the
timer rings.&lt;/li&gt;
&lt;li&gt;When your session ends, mark off one pomodoro and record what you
completed.&lt;/li&gt;
&lt;li&gt;Then enjoy a five-minute break.&lt;/li&gt;
&lt;li&gt;After four pomodoros, take a longer, more restorative 15-30 minute
break.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;
The functionality consists of two global variables and two
functions. I use a variable for the timer itself, and a variable that
can be used to customize the time. I use &lt;code&gt;defvar&lt;/code&gt;, which allows me to
add documentation to a variable as well. All the code I added to
&lt;code&gt;init.el&lt;/code&gt;.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-elisp&quot;&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;defvar&lt;/span&gt; &lt;span style=&quot;color: #00d3d0;&quot;&gt;pommodoro-timeout&lt;/span&gt; &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;25m&quot;&lt;/span&gt; &lt;span style=&quot;color: #b0d6f5;&quot;&gt;&quot;Duration of a pommodoro timer&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;defvar&lt;/span&gt; &lt;span style=&quot;color: #00d3d0;&quot;&gt;pommodoro-current-timer&lt;/span&gt; nil &lt;span style=&quot;color: #b0d6f5;&quot;&gt;&quot;The current pommodoro timer&quot;&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
Then I have two function that I can call when in Emacs. These I have
tied to &lt;code&gt;F2&lt;/code&gt; and &lt;code&gt;F3&lt;/code&gt;, so that they are really easy to reach. With
&lt;code&gt;pommodoro-start-timer&lt;/code&gt; I am prompted for a title and then a timer is
set with that title. With &lt;code&gt;pommodoro-show-timer&lt;/code&gt; a message appears in
the minibuffer that reminds me of the current timer and when it will
be finished.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-elisp&quot;&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;defun&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;pommodoro-start-timer&lt;/span&gt; &lt;span style=&quot;color: #ff62d4;&quot;&gt;()&lt;/span&gt;
  &lt;span style=&quot;color: #b0d6f5;&quot;&gt;&quot;Start a pommodoro timer which shows a notification after 25 minutes&quot;&lt;/span&gt;
  &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;interactive&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;
  &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;catch&lt;/span&gt; '&lt;span style=&quot;color: #00bcff;&quot;&gt;cancel&lt;/span&gt;
    &lt;span style=&quot;color: #3fdfd0;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;progn&lt;/span&gt;
      &lt;span style=&quot;color: #fba849;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #9f80ff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;and&lt;/span&gt; pommodoro-current-timer &lt;span style=&quot;color: #4fe42f;&quot;&gt;(&lt;/span&gt;time-less-p &lt;span style=&quot;color: #fe6060;&quot;&gt;(&lt;/span&gt;current-time&lt;span style=&quot;color: #fe6060;&quot;&gt;)&lt;/span&gt;
                &lt;span style=&quot;color: #fe6060;&quot;&gt;(&lt;/span&gt;timer--time pommodoro-current-timer&lt;span style=&quot;color: #fe6060;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #4fe42f;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #9f80ff;&quot;&gt;)&lt;/span&gt;
    &lt;span style=&quot;color: #9f80ff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;if&lt;/span&gt; &lt;span style=&quot;color: #4fe42f;&quot;&gt;(&lt;/span&gt;yes-or-no-p &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;There is a current pommodoro running, do you want to cancel it? &quot;&lt;/span&gt;&lt;span style=&quot;color: #4fe42f;&quot;&gt;)&lt;/span&gt;
        &lt;span style=&quot;color: #4fe42f;&quot;&gt;(&lt;/span&gt;cancel-timer pommodoro-current-timer&lt;span style=&quot;color: #4fe42f;&quot;&gt;)&lt;/span&gt; &lt;span style=&quot;color: #4fe42f;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;throw&lt;/span&gt; '&lt;span style=&quot;color: #00bcff;&quot;&gt;cancel&lt;/span&gt; t&lt;span style=&quot;color: #4fe42f;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #9f80ff;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #fba849;&quot;&gt;)&lt;/span&gt;
      &lt;span style=&quot;color: #fba849;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;setq&lt;/span&gt; pommodoro-current-timer
      &lt;span style=&quot;color: #9f80ff;&quot;&gt;(&lt;/span&gt;run-at-time pommodoro-timeout nil #'shell-command
       &lt;span style=&quot;color: #4fe42f;&quot;&gt;(&lt;/span&gt;format &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;notify-send -i messagebox_info -u critical 'Pommodoro done' %s&quot;&lt;/span&gt;
         &lt;span style=&quot;color: #fe6060;&quot;&gt;(&lt;/span&gt;read-string &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;Task description: &quot;&lt;/span&gt;&lt;span style=&quot;color: #fe6060;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #4fe42f;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #9f80ff;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #fba849;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #3fdfd0;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
This function sets a timer with &lt;code&gt;run-at-time&lt;/code&gt;, based on a description
that you have to enter (&lt;code&gt;read-string&lt;/code&gt;). There is one additional
functionality, if there is a currently running timer, I am prompted to
be sure I want to cancel that one. The notification is sent using
&lt;code&gt;shell-command&lt;/code&gt;, using the Linux utility &lt;code&gt;notify-send&lt;/code&gt;. There is a
package &lt;code&gt;notifications&lt;/code&gt; in Emacs which works well, except for one
thing, I couldn't get the notifications to show when in full-screen
mode.
&lt;/p&gt;

&lt;p&gt;
This is the function for seeing how much time there is left in the
current pommodoro which is very simple at the moment. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-elisp&quot;&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;defun&lt;/span&gt; &lt;span style=&quot;color: #feacd0;&quot;&gt;pommodoro-show-timer&lt;/span&gt; &lt;span style=&quot;color: #ff62d4;&quot;&gt;()&lt;/span&gt;
  &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;&lt;span style=&quot;color: #b6a0ff;&quot;&gt;interactive&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;
  &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;message &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;Pommodoro done at %s&quot;&lt;/span&gt;
     &lt;span style=&quot;color: #3fdfd0;&quot;&gt;(&lt;/span&gt;format-time-string &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;%T&quot;&lt;/span&gt; &lt;span style=&quot;color: #fba849;&quot;&gt;(&lt;/span&gt;timer--time pommodoro-current-timer&lt;span style=&quot;color: #fba849;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #3fdfd0;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
And finally the keybindings are set with
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-elisp&quot;&gt;&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;global-set-key &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;kbd &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&amp;lt;f2&amp;gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt; #'pommodoro-start-timer&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;span style=&quot;color: #ffffff;&quot;&gt;(&lt;/span&gt;global-set-key &lt;span style=&quot;color: #ff62d4;&quot;&gt;(&lt;/span&gt;kbd &lt;span style=&quot;color: #79a8ff;&quot;&gt;&quot;&amp;lt;f3&amp;gt;&quot;&lt;/span&gt;&lt;span style=&quot;color: #ff62d4;&quot;&gt;)&lt;/span&gt; #'pommodoro-show-timer&lt;span style=&quot;color: #ffffff;&quot;&gt;)&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;

&lt;p&gt;
I'll be trying to stick with the technique and this home-brewn
functionality for a while! Let's see how it works. Three improvements
I want to add right away are
&lt;/p&gt;

&lt;ul class=&quot;org-ul&quot;&gt;
&lt;li&gt;The &lt;code&gt;pommodoro-show-timer&lt;/code&gt; function should show the name of the
current task as well&lt;/li&gt;
&lt;li&gt;The icon should be a tomato&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;pommodoro-show-timer&lt;/code&gt; function should show the relative time
until running out instead of the actual time (&quot;5m to go!&quot; instead of
&quot;done at 22:04:23&quot;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
Thanks for reading! If you want to
reach out, post an issue to the &lt;a href=&quot;https://github.com/Gijs-Koot/Gijs-Koot.github.io&quot;&gt;Github repository of this website&lt;/a&gt; or
contact me on Twitter!
&lt;/p&gt;</content><author><name></name></author><category term="elisp," /><category term="emacs," /><category term="time" /><summary type="html">I wrote a pommodoro timer in elisp. Elisp is the language of Emacs, the 40 year old editor that is still going! Elisp is to Emacs what Javascript is to Visual Studio Code if that means more to you ;).</summary></entry><entry><title type="html">Popping balloons (2)</title><link href="/julia/probability/risk/2021/09/08/popping-balloons-2.html" rel="alternate" type="text/html" title="Popping balloons (2)" /><published>2021-09-08T00:00:00+00:00</published><updated>2021-09-08T00:00:00+00:00</updated><id>/julia/probability/risk/2021/09/08/popping-balloons-2</id><content type="html" xml:base="/julia/probability/risk/2021/09/08/popping-balloons-2.html">&lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;&lt;/script&gt;

&lt;div id=&quot;outline-container-org9a5ccb7&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org9a5ccb7&quot;&gt;Expected value of a single balloon&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org9a5ccb7&quot;&gt;
This is a follow up to the &lt;a class=&quot;prev&quot; href=&quot;/2021/08/27/balloons.html&quot;&gt;previous&lt;/a&gt; post on the same game

&lt;p&gt;
Let us assume, as before, that the balloon pops at a certain level
\(M\), and we believe \(M \sim \text{uniform}(1, 21)\). \(M = 1\) means the
balloon will pop immediately after we pump it once. 
&lt;/p&gt;

&lt;p&gt;
I figured out in the last post that the optimal strategy in this case
is to push 10 or 11 times, then stop. But what is then the expected
value of this game?  There are two options, either we reach 10 pushes
without popping the balloon, or it pops before we get there. Let's
call our cashout \(B\), and the expected value is
&lt;/p&gt;

\begin{align}
\mathbb{E}\left(B\right) &amp;amp;= \mathbb{P}\left(\text{reach 10
pushes}\right) \cdot 10 \\
&amp;amp;= \mathbb{P}(M &amp;gt; 10) \cdot 10 = 5
\end{align}

&lt;p&gt;
This is a simple formula, the expected value is the probability of
reaching the goal times the goal set. Using this, we can easily
evaluate other strategies. For example, what if our
strategy is to play 11 times?
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;11 * (9 / 20)
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
4.95
&lt;/pre&gt;


&lt;p&gt;
Plotting the expected value for all strategies, it follows a parabole,
topping out at 10 as calculated before.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;using Plots

bar(x -&amp;gt; (x * (20 - x) / 20), 0:20, labels=&quot;expected value&quot;, color=&quot;purple&quot;)
&lt;/pre&gt;
&lt;/div&gt;


&lt;div id=&quot;org79c39f2&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/uniformexpected.png&quot; alt=&quot;uniformexpected.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orged79f8c&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orged79f8c&quot;&gt;A different distribution for \(M\)&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orged79f8c&quot;&gt;
&lt;p&gt;
Now what if I believe \(M\) to follow a Poisson distribution, and let's
take 11 as the parameter as an example. First a plot of the distribution of \(M\).
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;using Distributions

belief = Poisson(11)

bar(x -&amp;gt; pdf(belief, x), -10:30, labels=&quot;probability&quot;, color=&quot;brown&quot;)
&lt;/pre&gt;
&lt;/div&gt;


&lt;div id=&quot;org73d76b0&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/poissondist.png&quot; alt=&quot;poissondist.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
And these are the expected values of playing on until a certain payoff
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;belief = Poisson(11)

expected = x -&amp;gt; (1 - cdf(belief, x)) * x

bar(expected, 0:20, labels=&quot;expected value&quot;, color=&quot;yellow&quot;)
&lt;/pre&gt;
&lt;/div&gt;


&lt;div id=&quot;orga4277e9&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/poissonexp.png&quot; alt=&quot;poissonexp.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
The optimal strategy is to play 8 rounds. The interesting thing is
that the two underlying distributions I analyzed have the same
average, but with a Poisson distribution it is optimal to stop before
reaching the average round at which the balloon pops. 
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;mean(Uniform(1, 21)), mean(Poisson(11))
&lt;/pre&gt;
&lt;/div&gt;

&lt;table border=&quot;2&quot; cellspacing=&quot;0&quot; cellpadding=&quot;6&quot; rules=&quot;groups&quot; frame=&quot;hsides&quot;&gt;


&lt;colgroup&gt;
&lt;col class=&quot;org-right&quot; /&gt;
&lt;/colgroup&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td class=&quot;org-right&quot;&gt;11.0&lt;/td&gt;
&lt;/tr&gt;

&lt;tr&gt;
&lt;td class=&quot;org-right&quot;&gt;11.0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orgaaa7fde&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orgaaa7fde&quot;&gt;Summary&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orgaaa7fde&quot;&gt;
&lt;p&gt;
In this post I used a much more straightforward formula to find the
optimal strategy, and was able to calculate expected payoffs for
different strategies for two different distributions. How the shape of
the distribution influences the optimal strategy is interesting, can
this be generalized to other distributions? Thanks for reading this
post, if you want to reach out, post an issue to the &lt;a href=&quot;https://github.com/Gijs-Koot/Gijs-Koot.github.io&quot;&gt;Github repository
of this website&lt;/a&gt; or contact me on Twitter!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><category term="julia" /><category term="probability" /><category term="risk" /><summary type="html"></summary></entry><entry><title type="html">Popping balloons</title><link href="/2021/08/27/balloons.html" rel="alternate" type="text/html" title="Popping balloons" /><published>2021-08-27T00:00:00+00:00</published><updated>2021-08-27T00:00:00+00:00</updated><id>/2021/08/27/balloons</id><content type="html" xml:base="/2021/08/27/balloons.html">&lt;script type=&quot;text/javascript&quot; src=&quot;http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;&lt;/script&gt;

&lt;div id=&quot;outline-container-org053b0c6&quot; class=&quot;outline-2&quot;&gt;
&lt;h2 id=&quot;org053b0c6&quot;&gt;Popping balloons risk assessment game&lt;/h2&gt;
&lt;div class=&quot;outline-text-2&quot; id=&quot;text-org053b0c6&quot;&gt;
&lt;p&gt;
Today I played an interesting game as part of a test at work. This
game was for testing my risk-aversity and risk-assesment skill. 
&lt;/p&gt;

&lt;p&gt;
In the game, you are pumping balloons one at the time. There are at
any moment two options.
&lt;/p&gt;

&lt;ul class=&quot;org-ul&quot;&gt;
&lt;li&gt;Cash in the balloon&lt;/li&gt;
&lt;li&gt;Pump the balloon&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;
If you cash in the balloon, the amount you gain is the number of time
the balloon was pumped. Then you get a new balloon to try on. If you
pump the balloon, it may pop, you get nothing and a new balloon
starts. If it doesn't pop, you can choose again with the same
balloon. You have a starting total of 30 balloons and the target is to
maximize your gains.
&lt;/p&gt;

&lt;p&gt;
The risk comes from the fact that you don't know how many pumps the
balloon can take.
&lt;/p&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orga9e2ba3&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orga9e2ba3&quot;&gt;Optimal strategy&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orga9e2ba3&quot;&gt;
&lt;p&gt;
Can we come up with an optimal strategy for this game? The problem is
a bit like the &lt;a href=&quot;https://en.wikipedia.org/wiki/Multi-armed_bandit&quot;&gt;Multi-armed bandit problem&lt;/a&gt;, perhaps even equivalent to
some form of it, but I am not an expert on the topic. In this post I
want to analyze a couple of strategies and assumptions of the pumping
balloon game.
&lt;/p&gt;

&lt;p&gt;
What makes this problem really hard is the tradeoff between
exploration and getting as much out of the current balloon as
possible. In the game, it is probably worth sacrificing a couple of
balloons to learn about how many pumps they can take. But how many you
want to sacrifice will depend on the total number of balloons you can
spend. So first, let's calculate some numbers on the best strategy if
you have only one balloon. I will leave the analysis of the
exploration tradeoffs for some other time.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-orgddfa4f2&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;orgddfa4f2&quot;&gt;Uniform prior&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-orgddfa4f2&quot;&gt;
&lt;p&gt;
If you know the exact number of pumps a balloon can take, the optimal
strategy is easy, you just pump until one below its maximum. Let's
call the maximum pressure the balloon can take &lt;code&gt;M&lt;/code&gt;. Now let's assume
you have some kind of idea of &lt;code&gt;M&lt;/code&gt;, you don't know what it is, but you
have a &quot;prior&quot; belief about &lt;code&gt;M&lt;/code&gt;. For example, you believe the maximum
pressure is not above 20, but any number below is equally likely.
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;using Plots
using Distributions

belief = Uniform(1, 21)
bar(x -&amp;gt; pdf(belief, x), -10:30, labels=&quot;probability&quot;)
&lt;/pre&gt;
&lt;/div&gt;

&lt;div id=&quot;orge47b4b8&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/uniformbelief.png&quot; alt=&quot;uniformbelief.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
There is one situation in which you are absolutely sure you want to
cash in, if you have pumped it 20 times. If you are at 19, what is
your updated belief about \(M\)? You've discarded all the possibilities
\(M &amp;lt; 19\), and the other two cases are equally likely, so
&lt;/p&gt;

&lt;p&gt;
\[
\mathbb{P}(M = 21) = \mathbb{P}(M = 20) = \frac{1}{2}
\]
&lt;/p&gt;

&lt;p&gt;
Now the expected value of cashing in is exactly 19, and the expected
value of pumping is 10,
&lt;/p&gt;

\begin{align}
\mathbb{E}(\text{pump}) = \mathbb{P}(M = 20) \cdot 0 + \mathbb{P}(M = 21) \cdot 20
= \frac{1}{2}\cdot 20 = 10
\end{align}

&lt;p&gt;
Let's do this calculation for 19. Your updated prior is that the
probability that it will break after one pump is \(\frac{1}{3}\). The
expected value of a pump is
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;2 / 3 * 19
&lt;/pre&gt;
&lt;/div&gt;

&lt;pre class=&quot;example&quot;&gt;
12.666666666666666
&lt;/pre&gt;


&lt;p&gt;
So still not worth it. Let's generalize a bit, if you have pumped \(i\)
times, there is a 1 in \(21 - i\) probability it will break on the next
try. Let's assume that after that try, it is not worth playing
on. Then the expected value of trying is \(\frac{20 - i}{21 - i}\cdot (i +
1)\), which you have to weigh against the immediate payoff of \(i\).
&lt;/p&gt;

&lt;div class=&quot;org-src-container&quot;&gt;
&lt;pre class=&quot;src src-ess-julia&quot;&gt;scatter(i -&amp;gt; (i + 1) * (20 - i) / (21 - i), 0:20, labels=&quot;Pump&quot;)
scatter!(i -&amp;gt; i, 0:20, labels=&quot;Cash&quot;)
&lt;/pre&gt;
&lt;/div&gt;


&lt;div id=&quot;org48c0d01&quot; class=&quot;figure&quot;&gt;
&lt;p&gt;&lt;img src=&quot;/assets/images/uniformpumpvcash.png&quot; alt=&quot;uniformpumpvcash.png&quot; /&gt;
&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;
From the graph, you can see that the break-even occurs at 10 pumps, as
intuitively makes sense. At that point, taking the risk and cashing in
has an equal payoff.
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;

&lt;div id=&quot;outline-container-org9e5deb3&quot; class=&quot;outline-3&quot;&gt;
&lt;h3 id=&quot;org9e5deb3&quot;&gt;Summary&lt;/h3&gt;
&lt;div class=&quot;outline-text-3&quot; id=&quot;text-org9e5deb3&quot;&gt;
&lt;p&gt;
An interesting game, and there is a lot more to explore, even for
playing only a single balloon! For example, what is the expected value
of playing the game? And can we analyze other distributions, such as
the Poisson distribution? Thanks for reading this post, if you want to
reach out, post an issue to the &lt;a href=&quot;https://github.com/Gijs-Koot/Gijs-Koot.github.io&quot;&gt;Github repository of this website&lt;/a&gt; or
contact me on Twitter!
&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;</content><author><name></name></author><summary type="html"></summary></entry><entry><title type="html">Completing the Advent of code 2020 with Julia</title><link href="/programming/julia/2021/01/15/advent-julia-2020.html" rel="alternate" type="text/html" title="Completing the Advent of code 2020 with Julia" /><published>2021-01-15T08:30:00+00:00</published><updated>2021-01-15T08:30:00+00:00</updated><id>/programming/julia/2021/01/15/advent-julia-2020</id><content type="html" xml:base="/programming/julia/2021/01/15/advent-julia-2020.html">&lt;p&gt;I just finished all 25 puzzles for the https://adventofcode.com/2020/. It was fun, I learned things and I took the chance to use the &lt;a href=&quot;https://julialang.org/&quot;&gt;Julia programming language&lt;/a&gt;. Prior to this project, my experience with the language was working through some a couple of chapters in &lt;a href=&quot;https://julia.quantecon.org&quot;&gt;Quantecon&lt;/a&gt;. I have also not participated in the Advent of Code competition before.&lt;/p&gt;

&lt;h2 id=&quot;advent-of-code&quot;&gt;Advent of Code&lt;/h2&gt;

&lt;p&gt;This is a competition consisting of 25 puzzles, released at midnight EST/UTC-5 every day starting December 1st, until December 25th. It has been running for a couple of years now. This year, 130k participants sent in the answer to the first puzzle, and around 10% of them persevered and finished all of the puzzles. There is a competition where every morning, the first 100 solutions get points awarded. My personal goal was to just finish the puzzles, following the strict time schedule would be an additional challenge I am not up for at this time.&lt;/p&gt;

&lt;p&gt;On December 2nd, it took “goffrie” 1 minute and 47 seconds to solve the second puzzle, this one https://adventofcode.com/2020/day/2. That’s pretty incredible, just understanding it takes me 5 minutes. However, most puzzles are tricky but not hard, and can be finished well within one hour.&lt;/p&gt;

&lt;h2 id=&quot;julia&quot;&gt;Julia&lt;/h2&gt;

&lt;p&gt;In my job, I am primarily a Python programmer, and I sometimes use R for visualization or model fitting specifically. I have been following and experimenting with Julia over the last two years, but I haven’t started using it professionally. Python and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pytorch&lt;/code&gt; have been very effective for us. After this experience, I have become even more convinced that Julia is a candidate for replacing Python as the most popular language for data science. In general, its advanced compilation system is an advantage. And as I’m learning and working with the language, it just feels very well designed to me.&lt;/p&gt;

&lt;p&gt;The combination of Advent puzzles with Julia worked out very well. I think it’s a great language for solving these problems, which is impressive, because Julia is designed for numerical and scientific computation. My solutions can be found on &lt;a href=&quot;https://github.com/Gijs-Koot/advent2020&quot;&gt;a GitHub repository&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Below some additional thoughts.&lt;/p&gt;

&lt;h2 id=&quot;learning-path&quot;&gt;Learning path&lt;/h2&gt;

&lt;p&gt;As an example of my learning, the problems all start with reading and parsing some input. Day one required reading an input file with some numbers&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;x&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;head&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;./&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;txt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;2000
50
1984
1600
1736
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the first day, I used this&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;io&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;io&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;offset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ncodeunits&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;y&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;5-element Array{Int64,1}:
 2000
   50
 1984
 1600
 1736
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then I figured out a nice system with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;do&lt;/code&gt; keyword&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;lines&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;./input.txt&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;do&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;io&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;read&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;io&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\n&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;lines&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;5-element Array{SubString{String},1}:
 &quot;2000&quot;
 &quot;50&quot;
 &quot;1984&quot;
 &quot;1600&quot;
 &quot;1736&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After some more iterations, I figured out I could just do this&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readlines&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;input.txt&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;numbers&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;5-element Array{Int64,1}:
 2000
   50
 1984
 1600
 1736
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Nice!&lt;/p&gt;

&lt;h3 id=&quot;workflow&quot;&gt;Workflow&lt;/h3&gt;

&lt;p&gt;I used &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;emacs&lt;/code&gt;, with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;julia-repl-mode&lt;/code&gt; plugin. This worked well except for sending functions I used a lot of Ctrl+Enters. I should just take the time to figure out a way to do that more efficiently. Compilation time was never a problem, I occasionally had the restart the kernel because I wanted to redefine types.&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# the testing macro worked great in this setup, I put in a lot of tests throughout the code&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Test&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; s&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@test&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;[32m[1mTest Passed[22m[39m
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;its-all-about-arrays&quot;&gt;It’s all about arrays&lt;/h3&gt;

&lt;p&gt;The big attraction of Julia over Python for is that it allows me to program performant algorithms in the language itself. In Python, I always cringe a bit when combing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numpy&lt;/code&gt; with loops, or having a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pd.Series.apply&lt;/code&gt; when working with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pandas&lt;/code&gt;. In Julia, I am allowed to write for loops without feeling guilty, and they are fast!&lt;/p&gt;

&lt;p&gt;In Python, there are many packages for fast linear algebra, such as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numpy&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch&lt;/code&gt;. However, these libraries don’t work together well with other parts of the language or other packages, unless they are specifically designed for working together. Things such as automatic differentiation have to be built on top of an ecosystem. I think of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;torch&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;numpy&lt;/code&gt; as sublanguages of Python, with their own datatypes, statically typed arrays. That is not a problem in itself maybe, and very effective for the coming years, but I think in the long run, this can never match the possibilities in Julia. In the python ecosystem, interoperability is organized with brilliant, but cumbersome constructs like the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__array__&lt;/code&gt; interface. That one works well, but who is familiar with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;__geo_interface__&lt;/code&gt; interface? To me, these ideas are all efforts to get performance by using Python’s great flexibility to work around the imcompatiblity between dynamic typing and a real array type. In short, data comes in the form of big collections, and to work with those fast, I believe you need fixed-length data types. Julia has them.&lt;/p&gt;

&lt;h3 id=&quot;short-is-good&quot;&gt;Short is good&lt;/h3&gt;

&lt;p&gt;I really liked the conciseness of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.&lt;/code&gt; operator, letting the function &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hello&lt;/code&gt; below operate on an array without additional work. And I’m also getting used to omitting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;return&lt;/code&gt; statement actually. These are just small things that make me love a language. The first 10 days I stubbornly included a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;return&lt;/code&gt; statement, following the Python zen; “Explicit is better than implicit”. The eleventh day I converted and left those prehistoric ideas behind me.&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;nf&quot;&gt; hello&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&quot;Hello, &lt;/span&gt;&lt;span class=&quot;si&quot;&gt;$(str)&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;!&quot;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;end&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;x&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;Gijs&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Simon&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;Geert&quot;&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;hello&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;names&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;3-element Array{String,1}:
 &quot;Hello, Gijs!&quot;
 &quot;Hello, Simon!&quot;
 &quot;Hello, Geert!&quot;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;losing-explicit-imports&quot;&gt;Losing explicit imports&lt;/h3&gt;

&lt;p&gt;One thing that I like more in Python are the very explicit imports. Whenever you encounter a name in a python source file, you can know where it came from just by looking at the file. For example, when you want to use a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DefaultDict&lt;/code&gt; in your code, you have two standard ways of getting access to the class, one way would be&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;from collections import DefaultDict

x = DefaultDict(0)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;or, alternatively&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;import collections

x = collections.DefaultDict
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Assuming you avoid &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;from collections import *&lt;/code&gt;, which is recommended practice, you will be able to tell from which package the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DefaultDict&lt;/code&gt; was imported. But in julia, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;using&lt;/code&gt; is more common than &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;import&lt;/code&gt;, and there is no direct link between the imports and the names.&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataStructures&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StatsBase&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DefaultDict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DefaultDict
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the case above, you don’t know if you are using &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Base.DefaultDict&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DataStructures.DefaultDict&lt;/code&gt;. In particular, the code will break if &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;StatsBase&lt;/code&gt; starts exporting their own &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DefaultDict&lt;/code&gt;. Unlikely, but I like explicitness, and it just makes it easier to lookup definitions, both for me and for my editor.&lt;/p&gt;

&lt;p&gt;In Julia, the standard practice is thus the other way around. The explicit imports actually exist and work.&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataStructures&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataStructures&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DefaultDict&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;DefaultDict
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There was some interesting discussion on &lt;a href=&quot;https://www.reddit.com/r/Julia/comments/kxhtvb/why_do_almost_all_julia_examples_pollute_the/&quot;&gt;Reddit&lt;/a&gt; on this topic just this morning.&lt;/p&gt;

&lt;h3 id=&quot;splatted&quot;&gt;Splatted&lt;/h3&gt;

&lt;p&gt;I use the asterisk in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;python&lt;/code&gt; all the time, even doing something like this.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;x = [*range(5)]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I like it better than the ellipsis in Julia.&lt;/p&gt;

&lt;div class=&quot;language-julia highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;45&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;x&quot;&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;1035
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;multiple-dispatch&quot;&gt;Multiple dispatch&lt;/h2&gt;

&lt;p&gt;At first this may be mistaken for a small nicety, but I think it’s really really powerful, and exactly what you need to create flexible and useful systems for data munging. For example, below is the initializer for a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DataFrame&lt;/code&gt; in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;pandas&lt;/code&gt; library (&lt;a href=&quot;https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py&quot;&gt;source&lt;/a&gt;). It’s all type checking, and it’s just going to clean up so much with dynamic dispatch.&lt;/p&gt;

&lt;p&gt;```def &lt;strong&gt;init&lt;/strong&gt;(
        self,
        data=None,
        index: Optional[Axes] = None,
        columns: Optional[Axes] = None,
        dtype: Optional[Dtype] = None,
        copy: bool = False,
    ):
        if data is None:
            data = {}
        if dtype is not None:
            dtype = self._validate_dtype(dtype)&lt;/p&gt;
&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;
```     if isinstance(data, DataFrame):
            data = data._mgr
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;

        if isinstance(data, (BlockManager, ArrayManager)):
            if index is None and columns is None and dtype is None and copy is False:
                # GH#33357 fastpath
                NDFrame.__init__(self, data)
                return

            mgr = self._init_mgr(
                data, axes={&quot;index&quot;: index, &quot;columns&quot;: columns}, dtype=dtype, copy=copy
            )

        elif isinstance(data, dict):
            mgr = init_dict(data, index, columns, dtype=dtype)
        elif isinstance(data, ma.MaskedArray):
            import numpy.ma.mrecords as mrecords

            # masked recarray
            if isinstance(data, mrecords.MaskedRecords):
                mgr = masked_rec_array_to_mgr(data, index, columns, dtype, copy)

            # a masked array
            else:
                data = sanitize_masked_array(data)
                mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

        elif isinstance(data, (np.ndarray, Series, Index)):
            if data.dtype.names:
                data_columns = list(data.dtype.names)
                data = {k: data[k] for k in data_columns}
                if columns is None:
                    columns = data_columns
                mgr = init_dict(data, index, columns, dtype=dtype)
            elif getattr(data, &quot;name&quot;, None) is not None:
                mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
            else:
                mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)

        # For data is list-like, or Iterable (will consume into list)
        elif is_list_like(data):
            if not isinstance(data, (abc.Sequence, ExtensionArray)):
                data = list(data)
            if len(data) &amp;gt; 0:
                if is_dataclass(data[0]):
                    data = dataclasses_to_dicts(data)
                if treat_as_nested(data):
                    arrays, columns, index = nested_data_to_arrays(
                        data, columns, index, dtype
                    )
                    mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
                else:
                    mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
            else:
                mgr = init_dict({}, index, columns, dtype=dtype)
        # For data is scalar
        else:
            if index is None or columns is None:
                raise ValueError(&quot;DataFrame constructor not properly called!&quot;)

            if not dtype:
                dtype, _ = infer_dtype_from_scalar(data, pandas_dtype=True)

            # For data is a scalar extension dtype
            if is_extension_array_dtype(dtype):
                # TODO(EA2D): special case not needed with 2D EAs

                values = [
                    construct_1d_arraylike_from_scalar(data, len(index), dtype)
                    for _ in range(len(columns))
                ]
                mgr = arrays_to_mgr(values, columns, index, columns, dtype=None)
            else:
                values = construct_2d_arraylike_from_scalar(
                    data, len(index), len(columns), dtype, copy
                )

                mgr = init_ndarray(
                    values, index, columns, dtype=values.dtype, copy=False
                )

        # ensure correct Manager type according to settings
        manager = get_option(&quot;mode.data_manager&quot;)
        mgr = mgr_to_mgr(mgr, typ=manager)

        NDFrame.__init__(self, mgr)


```julia

&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name></name></author><category term="programming" /><category term="julia" /><summary type="html">I just finished all 25 puzzles for the https://adventofcode.com/2020/. It was fun, I learned things and I took the chance to use the Julia programming language. Prior to this project, my experience with the language was working through some a couple of chapters in Quantecon. I have also not participated in the Advent of Code competition before.</summary></entry><entry><title type="html">Ubuntu tracker ignore directories</title><link href="/foss,/ubuntu/2020/02/20/ubuntu-tracker-settings.html" rel="alternate" type="text/html" title="Ubuntu tracker ignore directories" /><published>2020-02-20T00:00:00+00:00</published><updated>2020-02-20T00:00:00+00:00</updated><id>/foss,/ubuntu/2020/02/20/ubuntu-tracker-settings</id><content type="html" xml:base="/foss,/ubuntu/2020/02/20/ubuntu-tracker-settings.html">&lt;p&gt;Today I noticed &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tracker-store&lt;/code&gt; was eating a lot of CPU on my machine. So I digged a little into this program to figure out what it’s doing. I had no idea on this program, here’s how I figured out some things.&lt;/p&gt;

&lt;p&gt;I figured the problem was with some datasets with many files, millions of files, that the indexer was at least looking at, allthough hopefully not reading them, allthough there were &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;jpg&lt;/code&gt; images in there, perhaps the tracker would actually start indexing some metadata of those.&lt;/p&gt;

&lt;p&gt;So I wanted &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tracker&lt;/code&gt; to ignore all directories named &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt;, and for good measure, I wanted it to exclude &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;venv&lt;/code&gt; directories as well, because I have a lot of those for different projects, and they contain a lot of python source files. After some googling, I found out that you can add a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;.trackerignore&lt;/code&gt; file to a directory that will work, but I didn’t want to start adding this file to all &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;venv&lt;/code&gt; directories I will be creating in the feature.&lt;/p&gt;

&lt;h2 id=&quot;1-the-tracker-tool&quot;&gt;1. The tracker tool&lt;/h2&gt;

&lt;p&gt;There is a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;tracker&lt;/code&gt; tool which can influence some things, for example, you can reset the index and pause the mining process.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;usage: tracker [--version] [--help]
               &amp;lt;command&amp;gt; [&amp;lt;args&amp;gt;]

Available tracker commands are:
   daemon    Start, stop, pause and list processes responsible for indexing content
   extract   Extract information from a file
   info      Show information known about local files or items indexed
   index     Backup, restore, import and (re)index by MIME type or file name
   reset     Reset or remove index and revert configurations to defaults
   search    Search for content indexed or show content by type
   sparql    Query and update the index using SPARQL or search, list and tree the ontology
   sql       Query the database at the lowest level using SQL
   status    Show the indexing progress, content statistics and index state
   tag       Create, list or delete tags for indexed content

See “tracker help &amp;lt;command&amp;gt;” to read about a specific subcommand.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But I didn’t really find what I was looking for here.&lt;/p&gt;

&lt;h2 id=&quot;2-finding-documentation&quot;&gt;2. Finding documentation&lt;/h2&gt;

&lt;p&gt;I found all files this package was using with the following command.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ dpkg -L tracker
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This shows many files and directories, among others:&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;/usr/lib/tracker
...
/usr/share/doc/tracker/AUTHORS
/usr/share/doc/tracker/NEWS.gz
/usr/share/doc/tracker/README.md.gz
/usr/share/doc/tracker/copyright
...
/usr/share/glib-2.0/schemas/org.freedesktop.Tracker.gschema.xml
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Some documentation is available in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;README.md&lt;/code&gt; file, which also points to https://wiki.gnome.org/Projects/Tracker/Documentation/GettingStarted. On that link I found you can view the settings with this oneliner.&lt;/p&gt;

&lt;h2 id=&quot;3-accessing-tracker-settings&quot;&gt;3. Accessing tracker settings&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ gsettings list-recursively | grep -i org.freedesktop.Tracker | sort | uniq
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Which shows approximately 40 records, among others;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;org.freedesktop.Tracker.DB journal-chunk-size 50
...
org.freedesktop.Tracker.Miner.Files ignored-directories-with-content ['.trackerignore', '.git', '.hg', '.nomedia']
org.freedesktop.Tracker.Miner.Files ignored-files ['*~', '*.o', '*.la', '*.lo', '*.loT', '*.in', '*.csproj', '*.m4', '*.rej', '*.gmo', '*.orig', '*.pc', '*.omf', '*.aux', '*.tmp', '*.vmdk', '*.vm*', '*.nvram', '*.part', '*.rcore', '*.lzo', 'autom4te', 'conftest', 'confstat', 'Makefile', 'SCCS', 'ltmain.sh', 'libtool', 'config.status', 'confdefs.h', 'configure', '#*#', '~$*.doc?', '~$*.dot?', '~$*.xls?', '~$*.xlt?', '~$*.xlam', '~$*.ppt?', '~$*.pot?', '~$*.ppam', '~$*.ppsm', '~$*.ppsx', '~$*.vsd?', '~$*.vss?', '~$*.vst?', 'mimeapps.list', 'mimeinfo.cache', 'gnome-mimeapps.list', 'kde-mimeapps.list', '*.directory']
org.freedesktop.Tracker.Miner.Files index-on-battery-first-time true
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I wanted tracker to ignore all directories called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;data&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;venv&lt;/code&gt;, since these have many files, and they shouldn’t be indexed.&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ gsettings get org.freedesktop.Tracker.Miner.Files ignored-directories
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;['po', 'CVS', 'core-dumps', 'lost+found']```
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So finally I added the new entries, and then reset the whole index with the following commands.&lt;/p&gt;

&lt;h2 id=&quot;4-tldr&quot;&gt;4. TL;DR&lt;/h2&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;$ gsettings set org.freedesktop.Tracker.Miner.Files ignored-directories &quot;['po', 'CVS', 'core-dumps', 'lost+found', 'data', 'venv']&quot;
$ tracker reset -r
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And I learned more on the wonderful programs that are runnning on my pc. Hope it helps you!&lt;/p&gt;</content><author><name></name></author><category term="foss," /><category term="ubuntu" /><summary type="html">Today I noticed tracker-store was eating a lot of CPU on my machine. So I digged a little into this program to figure out what it’s doing. I had no idea on this program, here’s how I figured out some things.</summary></entry></feed>