ionel's codeloghttps://blog.ionelmc.ro/2023-01-09T00:00:00+02:00Old fashioned setup2023-01-09T00:00:00+02:002023-01-09T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2023-01-09:/2023/01/09/old-fashioned-setup/<p>Today's options for doing development on Windows <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> in Dockerized projects:</p>
<ul class="simple">
<li>A virtual machine (ol' reliable)<ul>
<li>Cumbersome to setup, but it's the beaten path, and you have this guide!</li>
<li>There are other hypervisors like VirtualBox or VMWare which have some advantages (easier networking - builtin NAT, port forwarding and
so on …</li></ul></li></ul><p>Today's options for doing development on Windows <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> in Dockerized projects:</p>
<ul class="simple">
<li>A virtual machine (ol' reliable)<ul>
<li>Cumbersome to setup, but it's the beaten path, and you have this guide!</li>
<li>There are other hypervisors like VirtualBox or VMWare which have some advantages (easier networking - builtin NAT, port forwarding and
so on) but performance and integration are I'd say the best in Hyper-V. Don't underestimate the importance of being able to shutdown
your Windows and not worry about your VM being stopped without a clean shutdown.</li>
</ul>
</li>
<li>Docker Desktop (Hyper-V mode)<ul>
<li>Using bind volumes becomes problematic as Docker will automatically create Windows Shares for various directories, and those will be
mounted as smb storage inside Docker's internal VM. Sometimes they break, and they lack support of advanced filesystem features and
permissions aren't 1:1. You'll have lots of problems with tooling that expects a typical Linux filesystem.</li>
</ul>
</li>
<li>Docker Desktop (WSL2 mode)<ul>
<li>It can run better than Hyper-V mode but you have to follow <a class="reference external" href="https://docs.docker.com/desktop/windows/wsl/">certain complicated guidelines</a>. If anything breaks you're left to debug a very complex technology stack.</li>
<li>Performance won't be great if your project is stored in Windows (and have the same problems as with a Docker Desktop in Hyper-V mode).
You'll have to move your project completely inside WSL. At that point you might as well just run a VM...</li>
</ul>
</li>
<li>Ubuntu <a class="reference external" href="https://multipass.run/">Multipass</a> (a Hyper-V VM management tool with cloud-init support)<ul>
<li>Easy to spin up lots of machines, preconfigured with cloud-init - useful for testing your Ansible playbooks for example.</li>
<li>You can't configure the network switch, and lots of other Hyper-V settings are arbitrary - not really useful for general purpose
development.</li>
<li>Being able to use cloud-init is nice but you don't really need automation for something that you do once do you?</li>
</ul>
</li>
</ul>
<div class="section" id="ol-reliable">
<h2>Ol' reliable<a class="headerlink" href="#ol-reliable" title="Permalink to this headline">
*</a></h2>
<div class="figure">
<img alt="Ol' reliable meme" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/ol-reliable.jpg" />
</div>
<p>The classical setup. The best performance and UI comes by running the apps (editors and tools) inside the VM and display them on the
host desktop via a X11 server running in the host OS.</p>
<p>This requires the user to manually do configure these:</p>
<ul class="simple">
<li><a class="reference internal" href="#network">Network</a></li>
<li><a class="reference internal" href="#virtual-machine">Virtual Machine</a></li>
<li><a class="reference internal" href="#ssh-client">SSH client</a></li>
<li><a class="reference internal" href="#desktop-support">Desktop support</a></li>
<li><a class="reference internal" href="#file-access">File access</a></li>
</ul>
</div>
<div class="section" id="network">
<h2>Network<a class="headerlink" href="#network" title="Permalink to this headline">
*</a></h2>
<p>Hyper-V comes with a builtin NAT switch but it is severely lacking in the features area, notably it doesn't have port forwarding.</p>
<p>However Windows allows you to create your own NAT virtual network adapter. Run this in a powershell (press <tt class="docutils literal">Start + X</tt>, <tt class="docutils literal">Run powershell (Admin)</tt>):</p>
<div class="highlight"><pre><span></span><span class="c"># Delete previous adapter, if any</span>
<span class="nb">Get-NetNat</span> <span class="p">|</span> <span class="nb">Remove-NetNat</span> <span class="n">-Confirm</span><span class="p">:</span><span class="nv">$false</span>
<span class="nb">Get-NetIPAddress</span> <span class="n">-InterfaceAlias</span> <span class="s2">"vEthernet (NAT)"</span> <span class="n">-ErrorAction</span> <span class="n">SilentlyContinue</span> <span class="p">|</span> <span class="nb">Remove-NetIPAddress</span> <span class="n">-Confirm</span><span class="p">:</span><span class="nv">$false</span>
<span class="c"># Create a new Hyper-V switch which in turn creates an adapter that we need to configure</span>
<span class="k">if</span> <span class="p">(!(</span><span class="nb">Get-VMSwitch</span> <span class="n">-SwitchName</span> <span class="s2">"NAT"</span> <span class="n">-ErrorAction</span> <span class="n">SilentlyContinue</span><span class="p">))</span> <span class="p">{</span>
<span class="nb">New-VMSwitch</span> <span class="n">-SwitchName</span> <span class="s2">"NAT"</span> <span class="n">-SwitchType</span> <span class="n">Internal</span>
<span class="p">}</span>
<span class="nb">Get-NetIPAddress</span> <span class="n">-InterfaceAlias</span> <span class="s2">"vEthernet (NAT)"</span> <span class="n">-ErrorAction</span> <span class="n">SilentlyContinue</span> <span class="p">|</span> <span class="nb">Remove-NetIPAddress</span> <span class="n">-Confirm</span><span class="p">:</span><span class="nv">$false</span>
<span class="nb">New-NetIPAddress</span> <span class="n">-IPAddress</span> <span class="n">10</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">1</span> <span class="n">-PrefixLength</span> <span class="n">24</span> <span class="n">-InterfaceAlias</span> <span class="s2">"vEthernet (NAT)"</span>
<span class="nb">New-NetNAT</span> <span class="n">-Name</span> <span class="s2">"NATNetwork"</span> <span class="n">-InternalIPInterfaceAddressPrefix</span> <span class="s2">"10.0.0.0/24"</span>
</pre></div>
<p>For port forwarding we have two options:</p>
<ul class="simple">
<li><em>NAT mappings</em> (which unfortunately exposes your port on all interfaces)</li>
<li><em>port proxies</em> (which you can configure to only accept local connections)</li>
</ul>
<p>If you want your VM to be accessible to other computers use <em>NAT mappings</em>, otherwise use <em>port proxies</em>. Performance is surely different
but unlikely to matter for typical development.</p>
<p>To create <em>NAT mappings</em> (<strong>insecure</strong>):</p>
<div class="highlight"><pre><span></span><span class="nb">Get-NetNatStaticMapping</span> <span class="p">|</span> <span class="nb">Remove-NetNatStaticMapping</span> <span class="n">-Confirm</span><span class="p">:</span><span class="nv">$false</span>
<span class="nb">Add-NetNatStaticMapping</span> <span class="n">-NatName</span> <span class="s2">"NATNetwork"</span> <span class="n">-Protocol</span> <span class="n">TCP</span> <span class="n">-InternalIPAddress</span> <span class="n">10</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">10</span> <span class="n">-ExternalIPAddress</span> <span class="n">0</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">0</span> <span class="n">-ExternalPort</span> <span class="n">80</span> <span class="n">-InternalPort</span> <span class="n">80</span>
</pre></div>
<p>The alternative, <em>port proxies</em> allow you to specify the listen address, for example this only allows local connections:</p>
<div class="highlight"><pre><span></span>netsh interface portproxy add v4tov4 listenport=80 listenaddress=127.0.0.1 connectport=80 connectaddress=10.0.0.10
</pre></div>
<p>And finally you can test your port:</p>
<div class="highlight"><pre><span></span><span class="nb">Get-NetTCPConnection</span> <span class="n">-State</span> <span class="n">Listen</span> <span class="n">-LocalPort</span> <span class="n">80</span>
<span class="nb">Test-NetConnection</span> <span class="n">-ComputerName</span> <span class="n">127</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">0</span><span class="p">.</span><span class="n">1</span> <span class="n">-Port</span> <span class="n">80</span>
</pre></div>
</div>
<div class="section" id="virtual-machine">
<h2>Virtual Machine<a class="headerlink" href="#virtual-machine" title="Permalink to this headline">
*</a></h2>
<p>You want to use something that runs the Hyper-V kernel modules for correct shutdown handling - Ubuntu or Fedora are tested to work correctly.</p>
<p>Ubuntu has an installation wizard:</p>
<div class="center docutils container">
<div class="figure">
<img alt="Installation wizard" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/wizard.png" />
<p class="caption"><strong>Does too many wrong things.</strong></p>
</div>
</div>
<p>You should download and install a <strong>Server</strong> ISO <em>instead</em>.
<a class="reference external" href="https://releases.ubuntu.com/22.04.1/ubuntu-22.04.1-live-server-amd64.iso">Ubuntu</a> or
<a class="reference external" href="https://download.fedoraproject.org/pub/fedora/linux/releases/36/Server/x86_64/iso/Fedora-Server-netinst-x86_64-36-1.5.iso">Fedora</a>.</p>
<p>While this wizard gets your VM quickly setup is has some disadvantages:</p>
<ul class="simple">
<li>It creates a dynamic expanding disk by default, something that you might find messy when you'll want to copy the VM to a different
machine. Static allocations should be used to get the most predictable results anyway.</li>
<li>It automatically installs a Desktop Ubuntu, something that you shouldn't use, and wont need to if you follow this guide.</li>
</ul>
<p>If you insist with the wizard at least give the VM a single word name to avoid quoting in scripts and whatnot.</p>
<div class="section" id="recommended-install-process">
<h3>Recommended install process<a class="headerlink" href="#recommended-install-process" title="Permalink to this headline">
*</a></h3>
<p>The best way is not always the quickest or easiest:</p>
<div class="center docutils container">
<div class="figure">
<img alt="Manual VM creation" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/manual-creation.png" />
<p class="caption">This is the way.</p>
</div>
</div>
<p>Recommendations:</p>
<ul class="simple">
<li>Don't put spaces or weird stuff in the name.</li>
<li>Use <strong>Generation 2</strong>.</li>
<li>Disable <strong>Dynamic Memory</strong> if you don't really need it.</li>
<li>Allocate at least 8GB memory. The more the better - you don't want your programs to run off swap.</li>
<li>Pick the "NAT" switch created before.</li>
<li>Pick the ISO you have downloaded in "Installation Options". The VM will boot from the ISO.</li>
</ul>
<p>Additional VM settings that you should do after creation:</p>
<div class="center docutils container">
<div class="figure">
<img alt="Automatic shutdown stop action" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/shutdown.png" />
<p class="caption">Shutdown guest on host shutdown - avoids disk trashing at shutdown, especially if you allocate a lot of memory to your VM.</p>
</div>
</div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="center docutils container">
<div class="figure">
<img alt="CPU allocation" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/cpu-allocation.png" />
<p class="caption">Don't allocate all the cores. Don't allocate just a couple cores either. Aim for 70-80% of your total system resources.</p>
</div>
</div>
<div class="line-block">
<div class="line"><br /></div>
</div>
<div class="center docutils container">
<div class="figure">
<img alt="Secure Boot" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/secure-boot.png" />
<p class="caption">You don't need or want this fancy "security" feature.</p>
</div>
</div>
</div>
<div class="section" id="networking-problems">
<h3>Networking problems<a class="headerlink" href="#networking-problems" title="Permalink to this headline">
*</a></h3>
<p>You'll notice in the installer that DHCP fails to get an IP, that's because that NAT switch created earlier doesn't have DHCP support at
all. This is why this setup is not exactly easy to get into. But once you iron out all the kinks believe me this is the best.</p>
<div class="center docutils container">
<div class="figure">
<img alt="Network configuration" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/network.png" />
<p class="caption">Since you won't be creating VMs all day long this is fine. One time ordeal.</p>
</div>
</div>
</div>
<div class="section" id="storage-layout">
<h3>Storage layout<a class="headerlink" href="#storage-layout" title="Permalink to this headline">
*</a></h3>
<p>You don't need fancy LVM. If you ever need to resize your storage it's simpler to just boot from a
<a class="reference external" href="https://gparted.org/download.php">GParted ISO</a> and resize a non-LVM partition.</p>
<div class="center docutils container">
<div class="figure">
<img alt="Storage layout" src="https://blog.ionelmc.ro/2023/01/09/old-fashioned-setup/storage-layout.png" />
<p class="caption">Avoid LVM. You don't need this abstraction.</p>
</div>
</div>
<p>LVM can do software RAID or encryption, features that you don't need or are best left to the host OS.</p>
</div>
<div class="section" id="ssh-access">
<h3>SSH access<a class="headerlink" href="#ssh-access" title="Permalink to this headline">
*</a></h3>
<p>Note that you can and should install the OpenSSH service right from the setup interface.</p>
<p>You can also import your ssh identity from GitHub. You should setup your VM with ssh identity access for security and convenience anyway.
Passwords are a thing of the past.</p>
<p>You can load your <tt class="docutils literal">.ppk</tt> key in Pagent at login by making a shortcut to it in <tt class="docutils literal">shell:startup</tt>
(to open it either run that via <tt class="docutils literal">Start</tt> + <tt class="docutils literal">R</tt> or as a File Explorer address).</p>
</div>
<div class="section" id="ubuntu-notes">
<h3>Ubuntu notes<a class="headerlink" href="#ubuntu-notes" title="Permalink to this headline">
*</a></h3>
<p>You don't really need any of the suggested Snap packages, not even docker. You always want the latest officially supported version, and that
you can get from: <a class="reference external" href="https://docs.docker.com/engine/install/ubuntu/">https://docs.docker.com/engine/install/ubuntu/</a></p>
</div>
</div>
<div class="section" id="ssh-client">
<h2>SSH client<a class="headerlink" href="#ssh-client" title="Permalink to this headline">
*</a></h2>
<p>The simplest way is to install <a class="reference external" href="https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html">Putty</a>.</p>
<p>You won't be using the X11 forwarding feature from Putty. Other settings that you can try, depending on your preferred shell mileage may
vary:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">putty-256color</span></tt> for Terminal-type string (alternative: <tt class="docutils literal">xterm</tt>)</li>
<li><tt class="docutils literal">ESC[n~</tt> mode for Home and End keys</li>
</ul>
<p>You might need to try a bunch of different settings for the best special keys handling, all dependant on shell.</p>
<p>Bash is the default shell. If you want to experiment you can try zsh (<a class="reference external" href="https://ohmyz.sh/">ohmyzsh</a> is a popular way to customize your
shell while <a class="reference external" href="https://github.com/zsh-users/antigen#installation">antigen</a> is the place to go when you want more than just a theme).
<strong>You should only start tweaking your shell after you got everything else working though!</strong></p>
<p>If you're new to Linux you should go over <a class="reference external" href="https://cheatography.com/davechild/cheat-sheets/linux-command-line/">this cheatsheet</a>.</p>
<div class="section" id="docker">
<h3>Docker<a class="headerlink" href="#docker" title="Permalink to this headline">
*</a></h3>
<p>When getting Docker installed avoid the Docker packages provided by the linux distro or snap.
Always use the official one from docker.com, e.g.: <a class="reference external" href="https://docs.docker.com/engine/install/ubuntu/">https://docs.docker.com/engine/install/ubuntu/</a></p>
<p>Don't forget to also read <a class="reference external" href="https://docs.docker.com/engine/install/linux-postinstall/">https://docs.docker.com/engine/install/linux-postinstall/</a> as you won't be using the root user for development, and
your user needs some extra privileges to access the Docker daemon without sudo.</p>
</div>
</div>
<div class="section" id="desktop-support">
<h2>Desktop support<a class="headerlink" href="#desktop-support" title="Permalink to this headline">
*</a></h2>
<p>This setup avoids running a full blown desktop inside the VM for performance and better integration with the Windows desktop. Thus you need
to run a X11 server on the host OS (Windows).</p>
<p>There are two options for X11 servers:</p>
<ul class="simple">
<li><a class="reference external" href="https://www.microsoft.com/store/productId/9NLP712ZMN9Q">X410</a> ($20 at least in Windows store, very easy to setup)</li>
<li><a class="reference external" href="https://sourceforge.net/projects/vcxsrv/">VcXsrv</a> (free, requires some hassle)</li>
</ul>
<p>VSOCKS are direct communication channels between host machine and guest. You can use it to forward the X11 server on the host to the guest
without Putty (and skip all the reliability issues that entails).</p>
<p>To put it differently: VSOCKS are the way to go. Even WSLg is using them.</p>
<div class="section" id="prereuisites">
<h3>Prereuisites<a class="headerlink" href="#prereuisites" title="Permalink to this headline">
*</a></h3>
<p>First the apps running in the guest must be able to connect through these sockets so we must create a proxy and setup X11 env vars.</p>
<div class="section" id="inside-the-vm">
<h4>Inside the VM<a class="headerlink" href="#inside-the-vm" title="Permalink to this headline">
*</a></h4>
<p>Create <tt class="docutils literal">/etc/systemd/system/x11vsock.service</tt>:</p>
<div class="highlight"><pre><span></span><span class="k">[Unit]</span><span class="w"></span>
<span class="na">Description</span><span class="o">=</span><span class="s">X410 VSOCK Service</span><span class="w"></span>
<span class="na">After</span><span class="o">=</span><span class="s">network.target</span><span class="w"></span>
<span class="k">[Service]</span><span class="w"></span>
<span class="na">User</span><span class="o">=</span><span class="s">root</span><span class="w"></span>
<span class="na">Restart</span><span class="o">=</span><span class="s">always</span><span class="w"></span>
<span class="na">Type</span><span class="o">=</span><span class="s">simple</span><span class="w"></span>
<span class="na">ExecStart</span><span class="o">=</span><span class="s">/usr/bin/socat -b65536 UNIX-LISTEN:/tmp/.X11-unix/X0,fork,mode=777 SOCKET-CONNECT:40:0:x0000x70170000x02000000x00000000</span><span class="w"></span>
<span class="k">[Install]</span><span class="w"></span>
<span class="na">WantedBy</span><span class="o">=</span><span class="s">multi-user.target</span><span class="w"></span>
</pre></div>
<p>Install socat if you don't have it already:</p>
<div class="highlight"><pre><span></span>sudo apt-get install socat
</pre></div>
<p>Then enable the service:</p>
<div class="highlight"><pre><span></span>sudo systemctl <span class="nb">enable</span> x11vsock.service
sudo systemctl start x11vsock.service
</pre></div>
<p>Make sure you check if it's actually running:</p>
<div class="highlight"><pre><span></span>sudo systemctl status x11vsock.service
</pre></div>
<p>Patch your environment to have correct X11 settings by creating <tt class="docutils literal">/etc/profile.d/x11vsock.sh</tt>:</p>
<div class="highlight"><pre><span></span><span class="k">if</span> <span class="o">[[</span> ! <span class="nv">$DISPLAY</span> <span class="o">&&</span> -S <span class="s2">"/tmp/.X11-unix/X0"</span> <span class="o">]]</span><span class="p">;</span> <span class="k">then</span>
<span class="nb">export</span> <span class="nv">DISPLAY</span><span class="o">=</span>:0.0
<span class="k">fi</span>
</pre></div>
<p>You should also test that these environment variables appear in your environment after logging in again. Try this in your shell to test:</p>
<div class="highlight"><pre><span></span>env <span class="p">|</span> grep DISPLAY
</pre></div>
<p>If it not there then you might need to consider a different file to place these environment variables, e.g.:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">~/.profile</span></tt></li>
<li><tt class="docutils literal"><span class="pre">~/.bashrc</span></tt></li>
<li><tt class="docutils literal"><span class="pre">~/.zshrc</span></tt></li>
<li><tt class="docutils literal">/etc/profile</tt></li>
</ul>
</div>
<div class="section" id="outside-the-vm-in-the-host">
<h4>Outside the VM (in the host)<a class="headerlink" href="#outside-the-vm-in-the-host" title="Permalink to this headline">
*</a></h4>
<p>On the host you need to do an obscure registry setting. Create a <tt class="docutils literal">x11.reg</tt> file and then run it:</p>
<div class="highlight"><pre><span></span>Windows Registry Editor Version 5.00<span class="w"></span>
<span class="k">[</span><span class="nb">HKEY_LOCAL_MACHINE</span><span class="k">\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization\GuestCommunicationServices\00001770-facb-11e6-bd58-64006a7986d3]</span><span class="w"></span>
<span class="na">"ElementName"</span><span class="o">=</span><span class="s">"X11 Display 0"</span><span class="w"></span>
</pre></div>
<p>You can read more about it <a class="reference external" href="https://x410.dev/cookbook/hyperv/opening-ubuntu-desktop-in-hyper-v-vm-on-x410-over-vsock/">here</a>.</p>
</div>
</div>
<div class="section" id="x410-1">
<h3>X410<a class="headerlink" href="#x410-1" title="Permalink to this headline">
*</a></h3>
<p>X410 is pretty simple to setup but it's paid solution. One time, no subscriptions or anything weird.</p>
<div class="figure">
<img alt="X410 configuration" src="X410.jpg" />
<p class="caption">That's all to it!</p>
</div>
</div>
<div class="section" id="vcxsrv-1">
<h3>VcXsrv<a class="headerlink" href="#vcxsrv-1" title="Permalink to this headline">
*</a></h3>
<p>VcXsrv is free X11 distribution for Windows. Don't waste your time with Xming, it's advertised as free but you have to pay and wait a
least a working day to get the latest version.</p>
<p>First you need to create a <tt class="docutils literal">config.xlaunch</tt> file somewhere:</p>
<div class="highlight"><pre><span></span><span class="cp"><?xml version="1.0" encoding="UTF-8"?></span>
<span class="nt"><XLaunch</span>
<span class="na">WindowMode=</span><span class="s">"MultiWindow"</span>
<span class="na">ClientMode=</span><span class="s">"NoClient"</span>
<span class="na">LocalClient=</span><span class="s">"False"</span>
<span class="na">Display=</span><span class="s">"-1"</span>
<span class="na">LocalProgram=</span><span class="s">"xcalc"</span>
<span class="na">RemoteProgram=</span><span class="s">"xterm"</span>
<span class="na">RemotePassword=</span><span class="s">""</span>
<span class="na">PrivateKey=</span><span class="s">""</span>
<span class="na">RemoteHost=</span><span class="s">""</span>
<span class="na">RemoteUser=</span><span class="s">""</span>
<span class="na">XDMCPHost=</span><span class="s">""</span>
<span class="na">XDMCPBroadcast=</span><span class="s">"False"</span>
<span class="na">XDMCPIndirect=</span><span class="s">"False"</span>
<span class="na">Clipboard=</span><span class="s">"True"</span>
<span class="na">ClipboardPrimary=</span><span class="s">"False"</span>
<span class="na">ExtraParams=</span><span class="s">"-vmid {00000000-0000-0000-0000-000000000000} -vsockport 6000"</span>
<span class="na">Wgl=</span><span class="s">"True"</span>
<span class="na">DisableAC=</span><span class="s">"False"</span>
<span class="na">XDMCPTerminate=</span><span class="s">"False"</span>
<span class="nt">/></span>
</pre></div>
<p>Now you only need to load this configuration at login by making a shortcut to it in <tt class="docutils literal">shell:startup</tt>
(to open it either run that via <tt class="docutils literal">Start</tt> + <tt class="docutils literal">R</tt> or as a File Explorer address).</p>
</div>
<div class="section" id="testing">
<h3>Testing<a class="headerlink" href="#testing" title="Permalink to this headline">
*</a></h3>
<p>Before spinning up your favourite IDE it's best to test the X11 connection with something really simple that is easy to debug.</p>
<p>Install and run xclock:</p>
<div class="highlight"><pre><span></span>sudo apt-get install xclock
xclock
</pre></div>
<p>An ugly analog clock application should appear in your taskbar if everything is working properly.</p>
<p>If you get an error then you need to make sure the x11vsock service you created earlier is running, that you have the right <tt class="docutils literal">DISPLAY</tt>
environment var is set and that your X11 server is actually running with Hyper-V VSOCK support enabled.</p>
</div>
</div>
<div class="section" id="file-access">
<h2>File access<a class="headerlink" href="#file-access" title="Permalink to this headline">
*</a></h2>
<p>Last but not least you might want some file access from Windows inside your VM. Of all the possible solutions the least worst solution is
via SSH - use <a class="reference external" href="https://www.nsoftware.com/download/download.aspx?sku=NDX3-A&type=free">SFTP Drive</a>.</p>
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td><p class="first">Ionel, are you crazy? Why are you using Windows? Windows is bad, expensive and big bad Micro$oft is spying on you!</p>
<p class="last">Yes. Games and apps. Yes but you can <a class="reference external" href="https://github.com/farag2/Sophia-Script-for-Windows">fix</a>
<a class="reference external" href="https://winaero.com/winaero-tweaker/">it</a>. Micro$oft can't stop you from buying a $10 second-hand OEM license either.</p>
</td></tr>
</tbody>
</table>
</div>
How to run uWSGI2022-03-14T00:00:00+02:002022-03-23T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2022-03-14:/2022/03/14/how-to-run-uwsgi/<p>Given the <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/Options.html">cornucopia</a> of options uWSGI offers it's really hard to figure
out what options and settings are good for your typical web app.</p>
<p>Normally you'd just balk and run something simpler with less knobs and dials, like mod-wsgi with Apache but alas, uWSGI is so flexible
and has so …</p><p>Given the <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/Options.html">cornucopia</a> of options uWSGI offers it's really hard to figure
out what options and settings are good for your typical web app.</p>
<p>Normally you'd just balk and run something simpler with less knobs and dials, like mod-wsgi with Apache but alas, uWSGI is so flexible
and has so many features that mod-wsgi lacks. If only it weren't so tricky to configure...</p>
<p>First off, hands down, this is the most important setting - you should always start your configuration in strict mode. This will save you
lots of pain and suffering if you ever fiddle with options.</p>
<div class="highlight"><pre><span></span><span class="k">[uwsgi]</span><span class="w"></span>
<span class="c1"># Error on unknown options (prevents typos)</span><span class="w"></span>
<span class="na">strict</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>In general the most reliable concurrency model is processes, with no threads:</p>
<div class="highlight"><pre><span></span><span class="c1"># Formula: cores * 2 + 2</span><span class="w"></span>
<span class="na">processes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(%k * 2 + 2)</span><span class="w"></span>
</pre></div>
<p>You could enable threads (the <tt class="docutils literal">threads</tt> option) and use less <tt class="docutils literal">processes</tt> but that can be problematic for code that is CPU-bound or not
thread-safe. I wouldn't enable the <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/Gevent.html">gevent plugin</a> - you're just asking for
trouble with all that monkey-patching. Essentially you're using more memory to avoid certain problems.</p>
<p>Most of the useful uWSGI features rely on the master process, it's a pretty mandatory option to have:</p>
<div class="highlight"><pre><span></span><span class="c1"># Most of uWSGI features depend on the master mode</span><span class="w"></span>
<span class="na">master</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>So now that we have a master process we can do either load the application in the master one time or load it in every worker process. If
your project has lots of imports and things going on at import time it's something worth considering but you need to be wary of how you
manage external resources (like connections, locks and whatnot).</p>
<p>Basically each worker would be a copy of the master process. While the memory is copy-on-write the resources probably aren't.</p>
<p>You can deal with shared FDs by marking them as close-on-exec, these options will make uWSGI mark all the FDs as COE before forking
a worker, and after forking uWSGI's internal FDs will also be COE (if you'd ever want to call fork() in your crazy app).</p>
<div class="highlight"><pre><span></span><span class="c1"># Close fds on fork (don't allow subprocess to mess with parent's fds)</span><span class="w"></span>
<span class="na">close-on-exec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">close-on-exec2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Locks can't be dealt with automatically. Well, the stdlib tries to, and even tho there have been many bug-fixes with logging locks being
improperly shared after a fork you can always get a very sticky surprise. So essentially you need to ask yourself what's more
important - speed or correctness.</p>
<p>If you're prepared to have health checks and rolling deployments, you shouldn't care so much about server boot time - I'm pretty sure
correctness is what you want, thus you should make uWSGI import your code after it has started all the workers. Slower but safer:</p>
<div class="highlight"><pre><span></span><span class="c1"># In case there's some bad global state (pointless to use with need-app = true)</span><span class="w"></span>
<span class="na">lazy-apps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Otherwise, if you load the app before fork you might as well make just the service fail if it can't load the app at all.
You can probably avoid implementing fancy health checks by just using this:</p>
<div class="highlight"><pre><span></span><span class="c1"># Exit if no app can be loaded (pointless to use with lazy-apps = true)</span><span class="w"></span>
<span class="na">need-app</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>You still need to have threading enabled most of the time, for example if you
use <a class="reference external" href="https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi">Sentry</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># Enable threads for sentry, see:</span><span class="w"></span>
<span class="c1"># https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi</span><span class="w"></span>
<span class="na">enable-threads</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Assuming you want to run a single project certain things can be disabled:</p>
<div class="highlight"><pre><span></span><span class="c1"># Avoid multiple interpreters (automatically created in case you need mounts)</span><span class="w"></span>
<span class="na">single-interpreter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Even if you don't run your app in a Docker container this is a good thing to do. Strangely uWSGI doesn't do this by default - a consequence
of having too many features and use-cases I guess...</p>
<div class="highlight"><pre><span></span><span class="c1"># Respect SIGTERM and do shutdown instead of reload</span><span class="w"></span>
<span class="na">die-on-term</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>The preferred way to load your app should be <tt class="docutils literal">module</tt> as it forces you get your application imported correctly.
If you want to keep the configuration file generic you can use an environment variable, example:</p>
<div class="highlight"><pre><span></span><span class="c1"># WSGI module (application callable expected inside)</span><span class="w"></span>
<span class="na">module</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">$(DJANGO_PROJECT_NAME).wsgi</span><span class="w"></span>
</pre></div>
<p>A bit of process management necessary most of the time:</p>
<div class="highlight"><pre><span></span><span class="c1"># Respawn processes that take more than ... seconds</span><span class="w"></span>
<span class="na">harakiri</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">300</span><span class="w"></span>
<span class="na">harakiri-verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Respawn processes after serving ... requests</span><span class="w"></span>
<span class="na">max-requests</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">5000</span><span class="w"></span>
<span class="c1"># Respawn if processes are bloated</span><span class="w"></span>
<span class="na">reload-on-as</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">1024</span><span class="w"></span>
<span class="na">reload-on-rss</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">512</span><span class="w"></span>
<span class="c1"># We don't expect abuse so lets have fastest respawn possible</span><span class="w"></span>
<span class="na">forkbomb-delay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">0</span><span class="w"></span>
</pre></div>
<p>I wouldn't use the evil reload variants (<tt class="docutils literal"><span class="pre">evil-reload-on-rss</span></tt> and <tt class="docutils literal"><span class="pre">evil-reload-on-as</span></tt>) as they will kill your workers at unexpected
points and that job is better left to the linux OOM killer anyway.</p>
<p>Assuming you'll have a Nginx frontend the best way to connect them is via a unix domain socket - it has the lowest overhead, and well, it's
better to have a file with the wrong perms than a port open on the wrong interface. Assuming you'll start uWSGI as root:</p>
<div class="highlight"><pre><span></span><span class="c1"># Assuming we start from root we need to create the socket way early</span><span class="w"></span>
<span class="na">shared-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.uwsgi</span><span class="w"></span>
<span class="na">chmod-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">666</span><span class="w"></span>
<span class="na">socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">=0</span><span class="w"></span>
<span class="c1"># Change user after binding the socket</span><span class="w"></span>
<span class="na">uid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">app</span><span class="w"></span>
<span class="na">gid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">app</span><span class="w"></span>
</pre></div>
<p>In Nginx all you need is something along these lines.</p>
<div class="highlight"><pre><span></span><span class="k">http</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Some fine-tuning</span>
<span class="w"> </span><span class="kn">client_max_body_size</span><span class="w"> </span><span class="mi">10m</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">client_body_buffer_size</span><span class="w"> </span><span class="mi">64k</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">large_client_header_buffers</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="mi">32k</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">server</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">location</span><span class="w"> </span><span class="s">/</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kn">include</span><span class="w"> </span><span class="s">/etc/nginx/uwsgi_params</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">uwsgi_pass</span><span class="w"> </span><span class="s">unix:/var/run/app.uwsgi</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">uwsgi_ignore_client_abort</span><span class="w"> </span><span class="no">on</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">uwsgi_next_upstream</span><span class="w"> </span><span class="no">off</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">uwsgi_read_timeout</span><span class="w"> </span><span class="mi">300</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Prevent nginx discarding large responses.</span>
<span class="w"> </span><span class="kn">uwsgi_buffering</span><span class="w"> </span><span class="no">on</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="c1"># Initial response size (practically headers size)</span>
<span class="w"> </span><span class="kn">uwsgi_buffer_size</span><span class="w"> </span><span class="mi">64k</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kn">uwsgi_buffers</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="mi">32k</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
<p>Why do we need all these buffer tweaks and limits you wonder? Well you should strive for compatibility and resilience:</p>
<ul class="simple">
<li>Allow requests with lots of cookies, should you need to have cookie session storage.
That means big headers thus we increase some buffer sizes.</li>
<li>Disallow really large uploads. Most apps don't need to take file uploads larger than 10Mb so that's a good default.</li>
<li>Prevent getting DOS-ed by slow-client type of attacks like <a class="reference external" href="https://en.wikipedia.org/wiki/Slowloris_(computer_security)">Slowris</a> or <a class="reference external" href="https://en.wikipedia.org/wiki/R-U-Dead-Yet">RUDY</a>.
That means the frontend needs to buffer the request body - an acceptable trade-off if we also have a request body limit.</li>
</ul>
<p>With those settings you should fare pretty well, but you should always tests anyway -
<a class="reference external" href="https://github.com/shekyan/slowhttptest">slowhttptest</a> is available as a Fedora and Ubuntu package.</p>
<p>Note that each worker will access the socket directly (call <tt class="docutils literal">accept()</tt> on that socket) regardless of protocol (TCP or UDS) thus
some workloads won't be evenly distributed to the uWSGI workers. So if you have an application that has some slow views and some fast views
a good option to consider is this:</p>
<div class="highlight"><pre><span></span><span class="c1"># Enable an accept mutex for a more balanced worker load</span><span class="w"></span>
<span class="na">thunder-lock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>You're essentially trading a bit of throughput and minimum latency for way better maximum latency.
Read more about it <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/articles/SerializingAccept.html">here</a>.</p>
<p>Other useful options:</p>
<div class="highlight"><pre><span></span><span class="c1"># Good for debugging/development</span><span class="w"></span>
<span class="na">auto-procname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-5xx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-slow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">1000</span><span class="w"></span>
<span class="na">log-date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[%%Y-%%m-%%d %%H:%%M:%%S]</span><span class="w"></span>
<span class="na">log-format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(ftime) "%(method) %(uri)" %(status) %(rsize)+%(hsize) in %(msecs)ms pid:%(pid) worker:%(wid) core:%(core)</span><span class="w"></span>
<span class="na">log-format-strftime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[%%Y-%%m-%%d %%H:%%M:%%S]</span><span class="w"></span>
<span class="c1"># Enable the stats service for uwsgitop, pip install uwsgitop, and run:</span><span class="w"></span>
<span class="c1"># uwsgitop /var/run/app.stats</span><span class="w"></span>
<span class="na">stats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.stats</span><span class="w"></span>
</pre></div>
<p>Another problem that you might care about, especially if you got used to <tt class="docutils literal">apachectl <span class="pre">-k</span> graceful</tt> is, well, waiting for pending requests
at shutdown. uWSGI just kills all the workers by default. You can enable graceful shutdown by having
<a class="reference external" href="https://github.com/unbit/uwsgi/issues/849#issuecomment-118869386">this hook</a>:</p>
<div class="highlight"><pre><span></span><span class="c1"># See: https://github.com/unbit/uwsgi/issues/849#issuecomment-118869386</span><span class="w"></span>
<span class="c1"># Note that SIGTERM is 15 not 1 :-)</span><span class="w"></span>
<span class="na">hook-master-start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">unix_signal:15 gracefully_kill_them_all</span><span class="w"></span>
</pre></div>
<p>Note that it would make uWSGI always do a graceful shutdown, and you should always have <tt class="docutils literal">harakiri</tt> enabled if you use this. Otherwise
shutdowns and restarts can get stuck.</p>
<p>Another way to do this is to use the master fifo and send a graceful shutdown command, eg:</p>
<div class="highlight"><pre><span></span><span class="c1"># For graceful shutdown you can run: echo q > /var/run/fifo.uwsgi</span><span class="w"></span>
<span class="na">master-fifo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/fifo.uwsgi</span><span class="w"></span>
</pre></div>
<p>You can also use this method to do a brutal shutdown/restart and
<a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/MasterFIFO.html">other things</a>.</p>
<div class="section" id="but-what-if-i-don-t-want-to-run-nginx">
<h2>But what if I don't want to run Nginx?<a class="headerlink" href="#but-what-if-i-don-t-want-to-run-nginx" title="Permalink to this headline">
*</a></h2>
<p>uWSGI certainly makes this possible but alas, it also makes it very hard to get it right. Remember that we need the frontend to do protect
the workers from abusive clients?</p>
<p>You'd think that running an <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/HTTP.html">HTTP router</a> (the <tt class="docutils literal">http</tt> option) as opposed to
having the workers serve HTTP directly (the <tt class="docutils literal"><span class="pre">http-socket</span></tt> option) would protect from <a class="reference external" href="https://en.wikipedia.org/wiki/Slowloris_(computer_security)">Slowris</a> or <a class="reference external" href="https://en.wikipedia.org/wiki/R-U-Dead-Yet">RUDY</a> (slow request body attack) but you'd be
very wrong.</p>
<p>You can easily test this by running <tt class="docutils literal">slowhttptest <span class="pre">-B</span></tt>. It fails quickly all while Nginx runs like a champ. So is there a way to solve
this? Or, how ugly is it? Funnily enough it's possible, and yes it's ugly and contrived:</p>
<div class="highlight"><pre><span></span><span class="c1"># Same setup as before, allow starting as root and changing user later by using a shared socket</span><span class="w"></span>
<span class="na">shared-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.uwsgi</span><span class="w"></span>
<span class="na">chmod-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">666</span><span class="w"></span>
<span class="na">uwsgi-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">=0</span><span class="w"></span>
<span class="c1"># This is how a request runs with this setup:</span><span class="w"></span>
<span class="c1"># http request -> http router -> fastrouter -> worker</span><span class="w"></span>
<span class="na">http-to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.router</span><span class="w"></span>
<span class="na">http</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">:8000</span><span class="w"></span>
<span class="na">fastrouter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.router</span><span class="w"></span>
<span class="na">fastrouter-use-pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.uwsgi</span><span class="w"></span>
<span class="c1"># Buffer in-memory up to 64kb</span><span class="w"></span>
<span class="na">fastrouter-post-buffering</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(64 * 1024)</span><span class="w"></span>
<span class="c1"># 10Mb request body limit</span><span class="w"></span>
<span class="na">limit-post</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(10 * 1024 * 1024)</span><span class="w"></span>
</pre></div>
<p>It can't be simpler because the <tt class="docutils literal"><span class="pre">post-buffering</span></tt> option (necessary to prevent the workers getting hosed up by slow requests) doesn't apply
to the http router - it applies to the worker. There's no <tt class="docutils literal"><span class="pre">http-post-buffering</span></tt> option thus the only choice is to have the fastrouter as
the buffering middleman.</p>
<p>Note that it's best to leave <tt class="docutils literal"><span class="pre">fastrouter-post-buffering</span></tt> to a small value as buffer handling
<a class="reference external" href="https://github.com/unbit/uwsgi/blob/2.0.20/core/buffer.c#L20">isn't very</a>
<a class="reference external" href="https://github.com/unbit/uwsgi/blob/2.0.20/plugins/http/http.c#L644">well done</a> in uWSGI.</p>
<p>Likely you'll need to serve static files as well:</p>
<div class="highlight"><pre><span></span><span class="na">static-map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/static=/var/www/static</span><span class="w"></span>
<span class="c1"># Expire after 24h</span><span class="w"></span>
<span class="na">static-expires</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">.* %(24 * 60 * 60)</span><span class="w"></span>
<span class="na">static-gzip-all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>The one tricky bit is the <tt class="docutils literal"><span class="pre">static-gzip-all</span></tt> option - uWSGI doesn't gzip on the fly - it expects .gz files around. There's a really easy
way to build them using <a class="reference external" href="https://pypi.org/project/whitenoise/">whitenoise</a>. Either run <tt class="docutils literal">python <span class="pre">-m</span> whitenoise.compress</tt> or use this
Django setting:</p>
<div class="highlight"><pre><span></span><span class="c1"># This automatically creates a .gz file for each static file</span>
<span class="n">STATICFILES_STORAGE</span> <span class="o">=</span> <span class="s2">"whitenoise.storage.CompressedStaticFilesStorage"</span>
</pre></div>
<p>Now you might wonder why not also gzip responses. There are two ways of doing it - <strong>both problematic</strong>:</p>
<ul>
<li><p class="first">Use <tt class="docutils literal"><span class="pre">http-auto-gzip</span></tt> like in this <a class="reference external" href="https://ugu.readthedocs.io/en/latest/compress.html">uWSGI guide</a>. Note that:</p>
<ul class="simple">
<li>You have to stop sending <tt class="docutils literal"><span class="pre">Content-Length</span></tt> from your application. You'll end up implementing middleware that removes the
<tt class="docutils literal"><span class="pre">Content-Length</span></tt> that <tt class="docutils literal">django.middleware.common.CommonMiddleware</tt> adds. No, you should not just remove <tt class="docutils literal">CommonMiddleware</tt> for
obvious reasons.</li>
<li>The <tt class="docutils literal"><span class="pre">uWSGI-Encoding</span></tt> header is not removable with this technique
(<tt class="docutils literal"><span class="pre">response-route-run</span> = <span class="pre">delheader:uWSGI-Encoding</span></tt> doesn't actually work).</li>
<li>You cannot tweak the compression ratio (it's hardcoded at <tt class="docutils literal">9</tt> - not really that efficient CPU-wise).</li>
</ul>
<p>Here's an example that would work in general, with the aforementioned tradeoffs:</p>
<div class="highlight"><pre><span></span><span class="c1"># I wouldn't copy this...</span><span class="w"></span>
<span class="na">http-auto-gzip</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">collect-header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Content-Type RESPONSE_CONTENT_TYPE</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">equal:${RESPONSE_CONTENT_TYPE};application/json addheader:uWSGI-Encoding: gzip</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">startswith:${RESPONSE_CONTENT_TYPE};text/ addheader:uWSGI-Encoding: gzip</span><span class="w"></span>
</pre></div>
</li>
<li><p class="first">Use <a class="reference external" href="https://uwsgi-docs.readthedocs.io/en/latest/Transformations.html">transformations</a>.
Although this approach is a bit more flexible, you still cannot tweak the compression ratio
(same hardcode at <tt class="docutils literal">9</tt> - inefficient CPU-wise) and it's more complex as you can see:</p>
<div class="highlight"><pre><span></span><span class="na">collect-header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Content-Type RESPONSE_CONTENT_TYPE</span><span class="w"></span>
<span class="na">collect-header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Content-Length RESPONSE_CONTENT_LENGTH</span><span class="w"></span>
<span class="c1"># uWSGI internal are not that smart, thus no content-length means it's 0</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">empty:${RESPONSE_CONTENT_LENGTH} goto:no-length</span><span class="w"></span>
<span class="c1"># Don't bother compressing 1kb responses, not worth the trouble</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">islower:${RESPONSE_CONTENT_LENGTH};1024 last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no-length</span><span class="w"></span>
<span class="c1"># Make sure the client actually wants gzip</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">contains:${HTTP_ACCEPT_ENCODING};gzip goto:check-response</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">check-response</span><span class="w"></span>
<span class="c1"># Don't bother compressing non-text stuff, usually not worth it</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">equal:${RESPONSE_CONTENT_TYPE};application/json goto:apply-gzip</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">startswith:${RESPONSE_CONTENT_TYPE};text/ goto:apply-gzip</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">apply-gzip</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">gzip:</span><span class="w"></span>
<span class="c1"># Why apply this filter too you wonder? The gzip transformation is not smart</span><span class="w"></span>
<span class="c1"># enough to chunk the body or set a Content-Length, thus keepalive will be broken</span><span class="w"></span>
<span class="na">http-auto-chunked</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
</pre></div>
<p>Previously this blog post had <tt class="docutils literal"><span class="pre">response-route-run</span> = chunked:</tt> but it appears that <tt class="docutils literal"><span class="pre">http-auto-chunked</span></tt> performs better.</p>
</li>
</ul>
</div>
<div class="section" id="tl-dr">
<h2>TL;DR<a class="headerlink" href="#tl-dr" title="Permalink to this headline">
*</a></h2>
<p>I just want to run uWSGI standalone, just give me my copy-pasta config or I'll copy something really bad from SO!</p>
<p>🙄</p>
<div class="highlight"><pre><span></span><span class="k">[uwsgi]</span><span class="w"></span>
<span class="c1"># Error on unknown options (prevents typos)</span><span class="w"></span>
<span class="na">strict</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Formula: cores * 2 + 2</span><span class="w"></span>
<span class="na">processes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(%k * 2 + 2)</span><span class="w"></span>
<span class="c1"># Most of uWSGI features depend on the master mode</span><span class="w"></span>
<span class="na">master</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Close fds on fork (don't allow subprocess to mess with parent's fds)</span><span class="w"></span>
<span class="na">close-on-exec</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">close-on-exec2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># In case there's some bad global state (pointless to use with need-app = true)</span><span class="w"></span>
<span class="na">lazy-apps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Enable threads for sentry, see:</span><span class="w"></span>
<span class="c1"># https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi</span><span class="w"></span>
<span class="na">enable-threads</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Avoid multiple interpreters (automatically created in case you need mounts)</span><span class="w"></span>
<span class="na">single-interpreter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Respect SIGTERM and do shutdown instead of reload</span><span class="w"></span>
<span class="na">die-on-term</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># See: https://github.com/unbit/uwsgi/issues/849#issuecomment-118869386</span><span class="w"></span>
<span class="c1"># Note that SIGTERM is 15 not 1 :-)</span><span class="w"></span>
<span class="na">hook-master-start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">unix_signal:15 gracefully_kill_them_all</span><span class="w"></span>
<span class="c1"># All the commands: https://uwsgi-docs.readthedocs.io/en/latest/MasterFIFO.html</span><span class="w"></span>
<span class="na">master-fifo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.fifo</span><span class="w"></span>
<span class="c1"># Respawn processes that take more than ... seconds</span><span class="w"></span>
<span class="na">harakiri</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">300</span><span class="w"></span>
<span class="na">harakiri-verbose</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Respawn processes after serving ... requests</span><span class="w"></span>
<span class="na">max-requests</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">5000</span><span class="w"></span>
<span class="c1"># Respawn if processes are bloated</span><span class="w"></span>
<span class="na">reload-on-as</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">1024</span><span class="w"></span>
<span class="na">reload-on-rss</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">512</span><span class="w"></span>
<span class="c1"># We don't expect abuse so lets have fastest respawn possible</span><span class="w"></span>
<span class="na">forkbomb-delay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">0</span><span class="w"></span>
<span class="c1"># Enable an accept mutex for a more balanced worker load</span><span class="w"></span>
<span class="na">thunder-lock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Good for debugging/development</span><span class="w"></span>
<span class="na">auto-procname</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-5xx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="na">log-slow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">1000</span><span class="w"></span>
<span class="na">log-date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[%%Y-%%m-%%d %%H:%%M:%%S]</span><span class="w"></span>
<span class="na">log-format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(ftime) "%(method) %(uri)" %(status) %(rsize)+%(hsize) in %(msecs)ms pid:%(pid) worker:%(wid) core:%(core)</span><span class="w"></span>
<span class="na">log-format-strftime</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">[%%Y-%%m-%%d %%H:%%M:%%S]</span><span class="w"></span>
<span class="c1"># Enable the stats service for uwsgitop, pip install uwsgitop, and run:</span><span class="w"></span>
<span class="c1"># uwsgitop /var/run/app.stats</span><span class="w"></span>
<span class="na">stats</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.stats</span><span class="w"></span>
<span class="c1"># Same setup as before, allow starting as root and changing user later by using a shared socket</span><span class="w"></span>
<span class="na">shared-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.uwsgi</span><span class="w"></span>
<span class="na">chmod-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">666</span><span class="w"></span>
<span class="na">uwsgi-socket</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">=0</span><span class="w"></span>
<span class="c1"># Change user after binding the socket</span><span class="w"></span>
<span class="na">uid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">app</span><span class="w"></span>
<span class="na">gid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">app</span><span class="w"></span>
<span class="c1"># This is how a request runs with this setup:</span><span class="w"></span>
<span class="c1"># http request -> http router -> fastrouter -> worker</span><span class="w"></span>
<span class="na">http-to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.router</span><span class="w"></span>
<span class="na">http</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">:8000</span><span class="w"></span>
<span class="na">fastrouter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.router</span><span class="w"></span>
<span class="na">fastrouter-use-pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/var/run/app.uwsgi</span><span class="w"></span>
<span class="c1"># Buffer in-memory up to 64kb</span><span class="w"></span>
<span class="na">fastrouter-post-buffering</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(64 * 1024)</span><span class="w"></span>
<span class="c1"># 10Mb request body limit</span><span class="w"></span>
<span class="na">limit-post</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">%(10 * 1024 * 1024)</span><span class="w"></span>
<span class="na">static-map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">/static=/var/www/static</span><span class="w"></span>
<span class="c1"># Expire after 24h</span><span class="w"></span>
<span class="na">static-expires</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">.* %(24 * 60 * 60)</span><span class="w"></span>
<span class="c1"># Don't forget to run python -m whitenoise.compress or similar!</span><span class="w"></span>
<span class="na">static-gzip-all</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">true</span><span class="w"></span>
<span class="c1"># Apply conditional gzip encoding</span><span class="w"></span>
<span class="na">collect-header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Content-Type RESPONSE_CONTENT_TYPE</span><span class="w"></span>
<span class="na">collect-header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">Content-Length RESPONSE_CONTENT_LENGTH</span><span class="w"></span>
<span class="c1"># uWSGI internal are not that smart, thus no content-length means it's 0</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">empty:${RESPONSE_CONTENT_LENGTH} goto:no-length</span><span class="w"></span>
<span class="c1"># Don't bother compressing 1kb responses, not worth the trouble</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">islower:${RESPONSE_CONTENT_LENGTH};1024 last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">no-length</span><span class="w"></span>
<span class="c1"># Make sure the client actually wants gzip</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">contains:${HTTP_ACCEPT_ENCODING};gzip goto:check-response</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">check-response</span><span class="w"></span>
<span class="c1"># Don't bother compressing non-text stuff, usually not worth it</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">equal:${RESPONSE_CONTENT_TYPE};application/json goto:apply-gzip</span><span class="w"></span>
<span class="na">response-route-if</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">startswith:${RESPONSE_CONTENT_TYPE};text/ goto:apply-gzip</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">last:</span><span class="w"></span>
<span class="na">response-route-label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">apply-gzip</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">gzip:</span><span class="w"></span>
<span class="c1"># Why apply this filter too you wonder? The gzip transformation is not smart</span><span class="w"></span>
<span class="c1"># enough to chunk the body or set a Content-Length, thus keepalive will be broken</span><span class="w"></span>
<span class="na">response-route-run</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">chunked:</span><span class="w"></span>
</pre></div>
<div class="section" id="addendum">
<h3>Addendum<a class="headerlink" href="#addendum" title="Permalink to this headline">
*</a></h3>
<p>Note that this example will make uWSGI create several files at <tt class="docutils literal">/var/run</tt> - it should be writable by the <cite>app</cite> user.</p>
</div>
</div>
Speeding up Django pagination2020-02-02T00:00:00+02:002020-02-02T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2020-02-02:/2020/02/02/speeding-up-django-pagination/<p>I assume you have already read <a class="reference external" href="https://hakibenita.com/optimizing-the-django-admin-paginator">Optimizing the Django Admin Paginator</a>. If
not, this is basically the take-away from that article:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">InfinityPaginator</span><span class="p">(</span><span class="n">Paginator</span><span class="p">):</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">99999999999</span>
<span class="k">class</span> <span class="nc">MyAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">paginator</span> <span class="o">=</span> <span class="n">InfinityPaginator</span>
<span class="n">show_full_result_count</span> <span class="o">=</span> <span class="kc">False</span>
</pre></div>
<p>Though the article has a trick with using a …</p><p>I assume you have already read <a class="reference external" href="https://hakibenita.com/optimizing-the-django-admin-paginator">Optimizing the Django Admin Paginator</a>. If
not, this is basically the take-away from that article:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">InfinityPaginator</span><span class="p">(</span><span class="n">Paginator</span><span class="p">):</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="mi">99999999999</span>
<span class="k">class</span> <span class="nc">MyAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">paginator</span> <span class="o">=</span> <span class="n">InfinityPaginator</span>
<span class="n">show_full_result_count</span> <span class="o">=</span> <span class="kc">False</span>
</pre></div>
<p>Though the article has a trick with using a <a class="reference external" href="https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-STATEMENT-TIMEOUT">statement_timeout</a>, I think it's pointless. In the real world you
should expect to get that overt 99999999999 count all over the place. Unless you have some sort of toy project it's very likely your
database will be under load. Add some user/group filtering and you'll be always hit the time limit.</p>
<p>What if you could make the count more realistic, but still cheap? Using a random number would be too inconsistent. Strangely enough someone
decided that it's a good idea to put a <a class="reference external" href="https://wiki.postgresql.org/wiki/Count_estimate">count estimate</a> idea in the postgresql wiki and,
for reasons I decided to see how hard is to implement it in django, in a somewhat generalized fashion</p>
<p>From a series of "<a class="reference external" href="https://blog.ionelmc.ro/presentations/just-because/">Just because you can, you have to try it!</a>", behold
<a class="footnote-reference" href="#the-drawback" id="footnote-reference-1">[1]</a>:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">EstimatedQuerySet</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">QuerySet</span><span class="p">):</span>
<span class="n">estimate_bias</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">estimate_threshold</span> <span class="o">=</span> <span class="mi">100</span>
<span class="k">def</span> <span class="nf">estimated_count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">_result_cache</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">qs</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_base_manager</span><span class="o">.</span><span class="n">all</span><span class="p">()</span>
<span class="n">compiler</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">get_compiler</span><span class="p">(</span><span class="s1">'default'</span><span class="p">)</span>
<span class="n">where</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">compiler</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">where</span><span class="p">)</span>
<span class="n">qs</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">extra</span><span class="p">(</span><span class="n">where</span><span class="o">=</span><span class="p">[</span><span class="n">where</span><span class="p">]</span> <span class="k">if</span> <span class="n">where</span> <span class="k">else</span> <span class="kc">None</span><span class="p">,</span> <span class="n">params</span><span class="o">=</span><span class="n">params</span><span class="p">)</span>
<span class="n">cursor</span> <span class="o">=</span> <span class="n">connections</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">db</span><span class="p">]</span><span class="o">.</span><span class="n">cursor</span><span class="p">()</span>
<span class="n">query</span> <span class="o">=</span> <span class="n">qs</span><span class="o">.</span><span class="n">query</span><span class="o">.</span><span class="n">clone</span><span class="p">()</span>
<span class="n">query</span><span class="o">.</span><span class="n">add_annotation</span><span class="p">(</span><span class="n">Count</span><span class="p">(</span><span class="s1">'*'</span><span class="p">),</span> <span class="n">alias</span><span class="o">=</span><span class="s1">'__count'</span><span class="p">,</span> <span class="n">is_summary</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
<span class="n">query</span><span class="o">.</span><span class="n">clear_ordering</span><span class="p">(</span><span class="kc">True</span><span class="p">)</span>
<span class="n">query</span><span class="o">.</span><span class="n">select_for_update</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">query</span><span class="o">.</span><span class="n">select_related</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">query</span><span class="o">.</span><span class="n">select</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">query</span><span class="o">.</span><span class="n">default_cols</span> <span class="o">=</span> <span class="kc">False</span>
<span class="n">sql</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="n">query</span><span class="o">.</span><span class="n">sql_with_params</span><span class="p">()</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">'Running EXPLAIN </span><span class="si">%s</span><span class="s1">'</span><span class="p">,</span> <span class="n">sql</span><span class="p">)</span>
<span class="n">cursor</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">"EXPLAIN </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="n">sql</span><span class="p">,</span> <span class="n">params</span><span class="p">)</span>
<span class="n">lines</span> <span class="o">=</span> <span class="n">cursor</span><span class="o">.</span><span class="n">fetchall</span><span class="p">()</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">'Got EXPLAIN result:</span><span class="se">\n</span><span class="s1">> </span><span class="si">%s</span><span class="s1">'</span><span class="p">,</span>
<span class="s1">'</span><span class="se">\n</span><span class="s1">> '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">line</span> <span class="k">for</span> <span class="n">line</span><span class="p">,</span> <span class="ow">in</span> <span class="n">lines</span><span class="p">))</span>
<span class="n">marker</span> <span class="o">=</span> <span class="s1">' on </span><span class="si">%s</span><span class="s1"> '</span> <span class="o">%</span> <span class="bp">self</span><span class="o">.</span><span class="n">model</span><span class="o">.</span><span class="n">_meta</span><span class="o">.</span><span class="n">db_table</span>
<span class="k">for</span> <span class="n">line</span><span class="p">,</span> <span class="ow">in</span> <span class="n">lines</span><span class="p">:</span>
<span class="k">if</span> <span class="n">marker</span> <span class="ow">in</span> <span class="n">line</span><span class="p">:</span>
<span class="k">for</span> <span class="n">part</span> <span class="ow">in</span> <span class="n">line</span><span class="o">.</span><span class="n">split</span><span class="p">():</span>
<span class="k">if</span> <span class="n">part</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="s1">'rows='</span><span class="p">):</span>
<span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">'Found size (</span><span class="si">%s</span><span class="s1">) estimate in query EXPLAIN: </span><span class="si">%s</span><span class="s1">'</span><span class="p">,</span>
<span class="n">part</span><span class="p">,</span> <span class="n">line</span><span class="p">)</span>
<span class="n">count</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="n">part</span><span class="p">[</span><span class="mi">5</span><span class="p">:])</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">estimate_bias</span><span class="p">)</span>
<span class="k">if</span> <span class="n">count</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">estimate_threshold</span><span class="p">:</span>
<span class="c1"># Unreliable, will make views with lots of filtering</span>
<span class="c1"># output confusing results.</span>
<span class="c1"># Just do normal count, shouldn't be that slow.</span>
<span class="c1"># (well, not much slower than the actual query)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">count</span>
<span class="k">return</span> <span class="n">qs</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
<span class="k">except</span> <span class="ne">Exception</span> <span class="k">as</span> <span class="n">exc</span><span class="p">:</span>
<span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="s2">"Failed to estimate queryset count: </span><span class="si">%s</span><span class="s2">"</span><span class="p">,</span> <span class="n">exc</span><span class="p">)</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">count</span><span class="p">()</span>
</pre></div>
<p>Because the normal count method is unchanged you can use that QuerySet everywhere.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">MyModel</span><span class="p">(</span><span class="n">models</span><span class="o">.</span><span class="n">Model</span><span class="p">):</span>
<span class="o">...</span>
<span class="n">objects</span> <span class="o">=</span> <span class="n">EstimatedQuerySet</span><span class="o">.</span><span class="n">as_manager</span><span class="p">()</span>
</pre></div>
<p>Now using the <tt class="docutils literal">estimated_count</tt> in the paginator will uncover a problem: sometimes it will underestimate. You can play with the
<tt class="docutils literal">estimate_bias</tt> but it will never work well with edge-cases (like heavy filtering).</p>
<p>A good compromise is to tune it for the general case and for everything else trick the pagination to always increment the page count when
you're looking at the last page.</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">EstimatedPaginator</span><span class="p">(</span><span class="n">Paginator</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">validate_number</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">number</span><span class="p">):</span>
<span class="k">if</span> <span class="n">number</span> <span class="o">>=</span> <span class="bp">self</span><span class="o">.</span><span class="n">num_pages</span><span class="p">:</span>
<span class="c1"># noinspection PyPropertyAccess</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_pages</span> <span class="o">=</span> <span class="n">number</span> <span class="o">+</span> <span class="mi">1</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">EstimatedPaginator</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">validate_number</span><span class="p">(</span><span class="n">number</span><span class="p">)</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">count</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">object_list</span><span class="o">.</span><span class="n">estimated_count</span><span class="p">()</span>
<span class="k">class</span> <span class="nc">MyAdmin</span><span class="p">(</span><span class="n">admin</span><span class="o">.</span><span class="n">ModelAdmin</span><span class="p">):</span>
<span class="n">paginator</span> <span class="o">=</span> <span class="n">EstimatedPaginator</span>
<span class="n">show_full_result_count</span> <span class="o">=</span> <span class="kc">False</span>
</pre></div>
<p>If you think that <tt class="docutils literal"># noinspection PyPropertyAccess</tt> is funny it's because it is - <tt class="docutils literal">num_pages</tt> is a <tt class="docutils literal">cached_property</tt> and the following
line destroys PyCharm's assumptions about how <a class="reference external" href="https://docs.python.org/3/reference/datamodel.html#invoking-descriptors">non-data descriptors</a> should work.</p>
<p>It also goes against sane practices like not having unexpected side-effects. But alas, it gets worse. There's another problem there: there's
always going to be a next page even if the current page is empty (or not full). To fix that we mess again with the internals:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">_get_page</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">objects</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="c1"># If page ain't full it means that it's the real last page, remove the extra.</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">objects</span><span class="p">)</span> <span class="o"><</span> <span class="bp">self</span><span class="o">.</span><span class="n">per_page</span><span class="p">:</span>
<span class="c1"># noinspection PyPropertyAccess</span>
<span class="bp">self</span><span class="o">.</span><span class="n">num_pages</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="nb">super</span><span class="p">(</span><span class="n">EstimatedPaginator</span><span class="p">,</span> <span class="bp">self</span><span class="p">)</span><span class="o">.</span><span class="n">_get_page</span><span class="p">(</span><span class="n">objects</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
</pre></div>
<p>One could still input an out of bounds page number through in the URL but I think it's pointless to handle that.</p>
<div class="section" id="what-about-that-pypropertyaccess">
<h2>What about that PyPropertyAccess?<a class="headerlink" href="#what-about-that-pypropertyaccess" title="Permalink to this headline">
*</a></h2>
<p>Suppose you have this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">cached_property</span><span class="p">:</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">func</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">func</span> <span class="o">=</span> <span class="n">func</span>
<span class="k">def</span> <span class="fm">__get__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">instance</span><span class="p">,</span> <span class="bp">cls</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
<span class="k">if</span> <span class="n">instance</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">self</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">instance</span><span class="o">.</span><span class="vm">__dict__</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">func</span><span class="o">.</span><span class="vm">__name__</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">func</span><span class="p">(</span><span class="n">instance</span><span class="p">)</span>
<span class="k">return</span> <span class="n">res</span>
<span class="k">class</span> <span class="nc">Foobar</span><span class="p">:</span>
<span class="nd">@cached_property</span>
<span class="k">def</span> <span class="nf">foo</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="s2">"bar"</span>
</pre></div>
<p>Because <tt class="docutils literal">cached_property</tt> doesn't implement a <tt class="docutils literal">__set__</tt>, assignments will be made through the instance's <tt class="docutils literal">__dict__</tt>:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">x</span> <span class="o">=</span> <span class="n">Foobar</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">x</span><span class="o">.</span><span class="n">foo</span> <span class="o">=</span> <span class="s1">'123'</span>
<span class="gp">>>> </span><span class="n">x</span><span class="o">.</span><span class="n">foo</span>
<span class="go">'123'</span>
<span class="gp">>>> </span><span class="n">y</span> <span class="o">=</span> <span class="n">Foobar</span><span class="p">()</span>
<span class="gp">>>> </span><span class="n">y</span><span class="o">.</span><span class="n">foo</span> <span class="o">+=</span> <span class="s1">'123'</span>
<span class="gp">>>> </span><span class="n">y</span><span class="o">.</span><span class="n">foo</span>
<span class="go">'bar123'</span>
</pre></div>
<p>I suspect that PyCharm doesn't discern data vs non-data descriptors at all. Or perhaps it's a subtle hint that it's a bad idea to assign to
something that doesn't implement a setter?</p>
<table class="docutils footnote" frame="void" id="the-drawback" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td>Though you should be wondering if you want to take a look at this hard-to-test method every time you upgrade Django ...</td></tr>
</tbody>
</table>
</div>
Is there anything safe in python?2020-01-20T00:00:00+02:002020-01-22T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2020-01-20:/2020/01/20/is-there-anything-safe-in-python/<p>In the process of working on <a class="reference external" href="https://github.com/ionelmc/python-hunter/">Hunter</a> I have found many strange things from merely trying to
do a <tt class="docutils literal">repr</tt> on objects that are passed around. Code blowing up with an exception is the least of your concerns. Take a look at this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">lazy</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">fun …</span></pre></div><p>In the process of working on <a class="reference external" href="https://github.com/ionelmc/python-hunter/">Hunter</a> I have found many strange things from merely trying to
do a <tt class="docutils literal">repr</tt> on objects that are passed around. Code blowing up with an exception is the least of your concerns. Take a look at this:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">lazy</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">fun</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_fun</span> <span class="o">=</span> <span class="n">fun</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_args</span> <span class="o">=</span> <span class="n">args</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_kwargs</span> <span class="o">=</span> <span class="n">kwargs</span>
<span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">evaluate</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">evaluate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_fun</span><span class="p">(</span><span class="o">*</span><span class="bp">self</span><span class="o">.</span><span class="n">_args</span><span class="p">,</span> <span class="o">**</span><span class="bp">self</span><span class="o">.</span><span class="n">_kwargs</span><span class="p">)</span>
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="bp">self</span><span class="p">())</span>
</pre></div>
<p>Simply doing a <tt class="docutils literal">repr</tt> on that will change the flow of the program, exactly what you don't want a debugging tool to do!</p>
<p>So then I tried something like:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rudimentary_repr</span><span class="p">(</span><span class="n">obj</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">dict</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="nb">list</span><span class="p">):</span>
<span class="o">...</span>
<span class="k">elif</span> <span class="o">...</span> <span class="c1"># goes on for a while</span>
<span class="o">...</span>
<span class="k">else</span><span class="p">:</span>
<span class="c1"># give the not very useful</span>
<span class="c1"># '<Something object at 0x123>'</span>
<span class="k">return</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__repr__</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
</pre></div>
<p>Add a simple depth check to deal with deep or infinite recursion and you're good right? I went for a simple depth check instead of <a class="reference external" href="https://docs.python.org/3/library/pprint.html">pprint</a>'s recursion checker (that stores
<a class="reference external" href="https://docs.python.org/3/library/functions.html#id">id</a> of objects):</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">rudimentary_repr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">maxdepth</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">maxdepth</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'...'</span>
<span class="n">newdepth</span> <span class="o">=</span> <span class="n">maxdepth</span> <span class="o">-</span> <span class="mi">1</span>
<span class="c1"># then pass around newdepth, easy-peasy</span>
</pre></div>
<p>At this point I thought the only real problem was how to reduce the number of branches and figure out on which objects it's safe to call
<tt class="docutils literal">repr</tt> (to avoid reimplementing <tt class="docutils literal">__repr__</tt> of everything interesting).</p>
<p>Then I added this, hoping this would save me lots of typing:</p>
<div class="highlight"><pre><span></span><span class="k">elif</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s1">'__dict__'</span><span class="p">):</span>
<span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
</pre></div>
<p>No <tt class="docutils literal">__dict__</tt> doesn't necessarily mean no state, but I hoped no one with do crummy stuff in <tt class="docutils literal">__repr__</tt> if they have an dict-less
object.</p>
<p>But then I found this little fella:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">ApiModule</span><span class="p">(</span><span class="n">ModuleType</span><span class="p">):</span>
<span class="nd">@property</span>
<span class="k">def</span> <span class="nf">__dict__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># force all the content of the module</span>
<span class="c1"># to be loaded when __dict__ is read</span>
<span class="o">...</span>
</pre></div>
<p>And doubled down in the terrible idea of checking for a <tt class="docutils literal">__dict__</tt> (instead of
<tt class="docutils literal">hasattr(obj, '__dict__')</tt> I'd use <tt class="docutils literal">hasdict(type(obj))</tt>):</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">hasdict</span><span class="p">(</span><span class="n">obj_type</span><span class="p">,</span> <span class="n">obj</span><span class="p">,</span> <span class="n">tolerance</span><span class="o">=</span><span class="mi">25</span><span class="p">):</span>
<span class="sd">"""</span>
<span class="sd"> A contrived mess to check that object</span>
<span class="sd"> doesn't have a __dit__ but avoid checking</span>
<span class="sd"> it if any ancestor is evil enough to</span>
<span class="sd"> explicitly define __dict__</span>
<span class="sd"> """</span>
<span class="n">ancestor_types</span> <span class="o">=</span> <span class="n">deque</span><span class="p">()</span>
<span class="k">while</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="ow">not</span> <span class="nb">type</span> <span class="ow">and</span> <span class="n">tolerance</span><span class="p">:</span>
<span class="n">ancestor_types</span><span class="o">.</span><span class="n">appendleft</span><span class="p">(</span><span class="n">obj_type</span><span class="p">)</span>
<span class="n">obj_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">obj_type</span><span class="p">)</span>
<span class="n">tolerance</span> <span class="o">-=</span> <span class="mi">1</span>
<span class="k">for</span> <span class="n">ancestor</span> <span class="ow">in</span> <span class="n">ancestor_types</span><span class="p">:</span>
<span class="vm">__dict__</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">ancestor</span><span class="p">,</span> <span class="s1">'__dict__'</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">if</span> <span class="vm">__dict__</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
<span class="k">if</span> <span class="s1">'__dict__'</span> <span class="ow">in</span> <span class="vm">__dict__</span><span class="p">:</span>
<span class="k">return</span> <span class="kc">True</span>
<span class="k">return</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s1">'__dict__'</span><span class="p">)</span>
</pre></div>
<p>I used that for a while until I came to the sad realization that you can't really trust anything. Behold:</p>
<div class="highlight"><pre><span></span><span class="k">class</span> <span class="nc">LazyObject</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="c1"># Need to pretend to be the wrapped class, for the sake of objects that</span>
<span class="c1"># care about this (especially in equality tests)</span>
<span class="vm">__class__</span> <span class="o">=</span> <span class="nb">property</span><span class="p">(</span><span class="n">new_method_proxy</span><span class="p">(</span><span class="n">operator</span><span class="o">.</span><span class="n">attrgetter</span><span class="p">(</span><span class="s2">"__class__"</span><span class="p">)))</span>
</pre></div>
<p>What exactly is going on there? A simplified example to illustrate the problem:</p>
<div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">class</span> <span class="nc">Surprise</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="gp">... </span> <span class="nd">@property</span>
<span class="gp">... </span> <span class="k">def</span> <span class="nf">__class__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="s1">'Boom!'</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">>>> </span><span class="n">p</span> <span class="o">=</span> <span class="n">Surprise</span><span class="p">()</span>
<span class="gp">>>> </span><span class="nb">isinstance</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="nb">dict</span><span class="p">)</span>
<span class="go">Boom!</span>
<span class="go">False</span>
</pre></div>
<p>At this point it became clear that the <tt class="docutils literal">hasdict</tt> idea wasn't going to fly for long so I ripped that out as well.</p>
<p>New plan:</p>
<ul class="simple">
<li>Don't bother showing details for subclasses of builtin types (like <tt class="docutils literal">dict</tt>, <tt class="docutils literal">list</tt> etc).
Subclasses could do any of the crazy things shown above.</li>
<li>Use <tt class="docutils literal">type</tt> instead of <tt class="docutils literal">isinstance</tt>. For example: to check if it's a <tt class="docutils literal">Exception</tt> instance just check if <tt class="docutils literal">BaseException</tt> is in
type's MRO. As I'm typing this I realise someone could stick a descriptor into the <tt class="docutils literal">args</tt> attribute, damn it. Perhaps
<a class="reference external" href="https://docs.python.org/3/library/inspect.html#inspect.getattr_static">getattr_static</a> would solve it.</li>
<li>Use <tt class="docutils literal">repr</tt> only on objects deemed to have a safe builtin type. Start with <tt class="docutils literal">builtins</tt>, <tt class="docutils literal">io</tt>, <tt class="docutils literal">socket</tt>, <tt class="docutils literal">_socket</tt>.</li>
</ul>
<p>What I got now:</p>
<div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">safe_repr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="n">maxdepth</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
<span class="k">if</span> <span class="ow">not</span> <span class="n">maxdepth</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'...'</span>
<span class="n">obj_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="n">obj_type_type</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="n">obj_type</span><span class="p">)</span>
<span class="n">newdepth</span> <span class="o">=</span> <span class="n">maxdepth</span> <span class="o">-</span> <span class="mi">1</span>
<span class="c1"># only represent exact builtins</span>
<span class="c1"># (subclasses can have side-effects due to __class__ being</span>
<span class="c1"># a property, __instancecheck__, __subclasscheck__ etc)</span>
<span class="k">if</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="nb">dict</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'{</span><span class="si">%s</span><span class="s1">}'</span> <span class="o">%</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">maxdepth</span><span class="p">),</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">obj</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="nb">list</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'[</span><span class="si">%s</span><span class="s1">]'</span> <span class="o">%</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="nb">tuple</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'(</span><span class="si">%s%s</span><span class="s1">)'</span> <span class="o">%</span> <span class="p">(</span>
<span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span><span class="p">),</span>
<span class="s1">','</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="s1">''</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="nb">set</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'{</span><span class="si">%s</span><span class="s1">}'</span> <span class="o">%</span> <span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="nb">frozenset</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">({</span><span class="si">%s</span><span class="s1">})'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">obj_type</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span>
<span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="n">deque</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">([</span><span class="si">%s</span><span class="s1">])'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">obj_type</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span>
<span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">in</span> <span class="p">(</span><span class="n">Counter</span><span class="p">,</span> <span class="n">OrderedDict</span><span class="p">,</span> <span class="n">defaultdict</span><span class="p">):</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">({</span><span class="si">%s</span><span class="s1">})'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">obj_type</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span>
<span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="s1">'</span><span class="si">%s</span><span class="s1">: </span><span class="si">%s</span><span class="s1">'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">maxdepth</span><span class="p">),</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="n">v</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span>
<span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">obj</span><span class="o">.</span><span class="n">items</span><span class="p">())</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type</span> <span class="ow">is</span> <span class="n">types</span><span class="o">.</span><span class="n">MethodType</span><span class="p">:</span> <span class="c1"># noqa</span>
<span class="bp">self</span> <span class="o">=</span> <span class="n">obj</span><span class="o">.</span><span class="vm">__self__</span>
<span class="n">name</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">obj</span><span class="p">,</span> <span class="s1">'__qualname__'</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span>
<span class="k">if</span> <span class="n">name</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="n">name</span> <span class="o">=</span> <span class="n">obj</span><span class="o">.</span><span class="vm">__name__</span>
<span class="k">return</span> <span class="s1">'<</span><span class="si">%s</span><span class="s1">bound method </span><span class="si">%s</span><span class="s1"> of </span><span class="si">%s</span><span class="s1">>'</span> <span class="o">%</span> <span class="p">(</span>
<span class="s1">'un'</span> <span class="k">if</span> <span class="bp">self</span> <span class="ow">is</span> <span class="kc">None</span> <span class="k">else</span> <span class="s1">''</span><span class="p">,</span>
<span class="n">name</span><span class="p">,</span>
<span class="n">safe_repr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type_type</span> <span class="ow">is</span> <span class="nb">type</span> <span class="ow">and</span> <span class="ne">BaseException</span> <span class="ow">in</span> <span class="n">obj_type</span><span class="o">.</span><span class="vm">__mro__</span><span class="p">:</span>
<span class="k">return</span> <span class="s1">'</span><span class="si">%s</span><span class="s1">(</span><span class="si">%s</span><span class="s1">)'</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">obj_type</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span>
<span class="s1">', '</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">safe_repr</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">newdepth</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">obj</span><span class="o">.</span><span class="n">args</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">elif</span> <span class="n">obj_type_type</span> <span class="ow">is</span> <span class="nb">type</span> <span class="ow">and</span> \
<span class="n">obj_type</span> <span class="ow">is</span> <span class="ow">not</span> <span class="n">InstanceType</span> <span class="ow">and</span> \
<span class="n">obj_type</span><span class="o">.</span><span class="vm">__module__</span> <span class="ow">in</span> <span class="p">(</span><span class="n">builtins</span><span class="o">.</span><span class="vm">__name__</span><span class="p">,</span> <span class="s1">'io'</span><span class="p">,</span> <span class="s1">'socket'</span><span class="p">,</span> <span class="s1">'_socket'</span><span class="p">):</span>
<span class="c1"># hardcoded list of safe things. note that isinstance ain't used</span>
<span class="c1"># (and we don't trust subclasses to do the right thing in __repr__)</span>
<span class="k">return</span> <span class="nb">repr</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__repr__</span><span class="p">(</span><span class="n">obj</span><span class="p">)</span>
</pre></div>
<p>The problematic code examples are taken out of popular projects like Celery, Pytest and Django but I don't think it matters who does it.
What do you think?</p>
Proxy issues with Apache2018-03-25T00:00:00+02:002018-03-25T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2018-03-25:/2018/03/25/proxy-issues-with-apache/<p>So few days ago I had a very weird bug with an Apache httpd SSL terminator setup. So basically
just a proxy to Nginx+uWSGI serving plain http. Do not ask why such a contrived setup, it's beyond reason.</p>
<p>Sometimes httpd would give this sort of error and big file …</p><p>So few days ago I had a very weird bug with an Apache httpd SSL terminator setup. So basically
just a proxy to Nginx+uWSGI serving plain http. Do not ask why such a contrived setup, it's beyond reason.</p>
<p>Sometimes httpd would give this sort of error and big file uploads:</p>
<div class="term docutils container">
<pre class="literal-block">
[proxy:error] [pid 18:tid 140393219815168] (32)Broken pipe: [client 10.0.2.2:55239] AH01084: pass request body failed to 172.21.0.9:80 (web), referer: https://localhost/upload/
[proxy_http:error] [pid 18:tid 140393219815168] [client 10.0.2.2:55239] AH01097: pass request body failed to 172.21.0.9:80 (web) from 10.0.2.2 (), referer: https://localhost/upload/
</pre>
</div>
<p>Trying few requests locally didn't reproduce the problem.</p>
<p>Not considering it only happened with large files I suspected there's some sort of boundary or timeout that is being hit.
Firefox has a slow connection simulator in its Web Developer tools. Open up Responsive Design Mode and there are the
throttling settings:</p>
<img alt="A screenshot of Firefox connection throttling options" src="https://blog.ionelmc.ro/2018/03/25/proxy-issues-with-apache/ff-throttling.png" />
<p>Note the funny "Good 3G" choice. As if someone would say "good devils" or "nice jerks".</p>
<p>Fortunately for me this reproduced the problem, and after few tries it turned out it reproduced with small files. So not a
boundary, but a timeout problem.</p>
<p>Now I've tried to no avail making various Nginx settings like:</p>
<pre class="literal-block">
proxy_read_timeout 600s;
client_body_timeout 600s;
client_header_timeout 600s;
send_timeout 600s;
</pre>
<p>And in the Apache conf:</p>
<pre class="literal-block">
ProxyTimeout 600
</pre>
<p>So basically playing with this settings was a bit of <a class="reference external" href="https://blog.ionelmc.ro/2016/02/18/notes-on-debugging/#find-the-cause-first">shotgun debugging</a>. Time to get serious and break out <a class="reference external" href="http://www.sysdig.org/">sysdig</a>:</p>
<div class="term docutils container">
<pre class="literal-block">
sysdig container.name=app_ssl_1 or container.name=app_web_1 -A -s 1000 > log
</pre>
</div>
<ul class="simple">
<li>I had two containers, thus the <tt class="docutils literal">container.name</tt> filters. If you don't have containers then probably you'll want to <a class="reference external" href="https://github.com/draios/sysdig/wiki/Sysdig-User-Guide#user-content-filtering">filter
by process name</a>, as the output is already
very verbose.</li>
<li>The <tt class="docutils literal"><span class="pre">-A</span></tt> (or <tt class="docutils literal"><span class="pre">--print-ascii</span></tt>) is to strip non-ascii stuff from output.</li>
<li>The <tt class="docutils literal"><span class="pre">-s</span> 1000</tt> is to see 1000 bytes from the various string being passed around.</li>
</ul>
<p>So at this point I have a 30mb log file. Grepping <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a> backwards through the output for <tt class="docutils literal">close</tt> or connections seems like a
good start. It turned out:</p>
<div class="term docutils container">
<pre class="literal-block">
21:48:04.525468894 2 httpd (17039) < close res=0
21:48:04.525577726 2 httpd (17039) > close fd=10(<f>/tmp/modproxy.tmp.pmo53y)
21:48:04.525578150 2 httpd (17039) < close res=0
21:48:09.498690963 1 httpd (17064) > close fd=11(<4t>10.0.2.2:58997->172.21.0.8:443)
21:48:09.498691856 1 httpd (17064) < close res=0
21:48:46.935441278 1 nginx (25919) > close fd=18(<4t>172.21.0.8:35834->172.21.0.9:80)
21:48:46.935442465 1 nginx (25919) < close res=0
</pre>
</div>
<p>So basically Apache closing some weird file and then the ssl socket. Then Nginx closes connection from Apache. Not very
helpful. At least I know from what port Apache connects to Nginx so I'll grep for <tt class="docutils literal">:35834</tt>? Turns out this is a bogus
connection, not my POST file upload. When looking at syscalls it's very important to not get bogged down in irrelevant
details.</p>
<p>So now I search backwards for <tt class="docutils literal">POST /upload</tt>. This has turned out something very promising:</p>
<div class="term docutils container">
<pre class="literal-block">
21:48:04.525366305 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=882
21:48:04.525401643 2 httpd (17039) < writev res=882 data=
POST /upload/ HTTP/1.1
Host: web
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://localhost/upload/
Content-Type: multipart/form-data; boundary=---------------------------156582199825391
Cookie: ...
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
X-Forwarded-For: 10.0.2.2
X-Forwarded-Host: localhost
X-Forwarded-Server: localhost
21:48:04.525407958 2 httpd (17039) > mmap addr=0 length=4194304 prot=1(PROT_READ) flags=1(MAP_SHARED) fd=10(<f>/tmp/modproxy.tmp.pmo53y) offset=0
21:48:04.525417756 2 httpd (17039) < mmap res=7FE14C0AF000 vm_size=770340 vm_rss=14720 vm_swap=0
21:48:04.525420158 2 httpd (17039) > mmap addr=0 length=3093534 prot=1(PROT_READ) flags=1(MAP_SHARED) fd=10(<f>/tmp/modproxy.tmp.pmo53y) offset=4194304
21:48:04.525422012 2 httpd (17039) < mmap res=7FE13E4F9000 vm_size=773364 vm_rss=14720 vm_swap=0
21:48:04.525422762 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=7304273
21:48:04.525424849 2 httpd (17039) < writev res=-32(EPIPE) data=
Connection: Keep-Alive
Content-Length: 7304222
-----------------------------156582199825391
... tons of form data ...
</pre>
</div>
<p>Very interesting, the <tt class="docutils literal">writev</tt> syscall error matches the initial error I've seen in the Apache logs (<tt class="docutils literal">Broken pipe: [client
10.0.2.2:55239] AH01084: pass request body failed to 172.21.0.9:80</tt>). Can't be a simple coincidence.</p>
<p>So at this point it makes sense to look at the whole lifecycle of <tt class="docutils literal"><span class="pre">fd=13(<4t>172.21.0.8:35836->172.21.0.9:80)</span></tt>. Now because
FDs can easily have collisions (they get recycled, other processes can have a FD with same number) it's easier to just grep
for the host and port (<tt class="docutils literal">172.21.0.8:35836</tt>):</p>
<div class="term docutils container">
<pre class="literal-block">
21:46:45.783868277 0 nginx (25919) > sendfile out_fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) in_fd=20(<f>/var/app/static/bower_components/jquery-ui/themes/smoothness/images/ui-icons_cd0a0a_256x240.png) offset=0 size=4599
21:46:45.783927701 0 nginx (25919) > recvfrom fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) size=1024
21:46:45.783936625 0 httpd (17039) > read fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=8000
21:46:45.790029511 0 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=843
21:46:45.790072392 0 httpd (17039) > read fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=8000
21:46:45.790090824 0 nginx (25919) > recvfrom fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) size=1024
21:46:45.790214052 0 nginx (25919) > writev fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) size=323
21:46:45.790219039 0 nginx (25919) > sendfile out_fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) in_fd=20(<f>/var/app/static/bower_components/jquery-ui/themes/smoothness/images/ui-icons_888888_256x240.png) offset=0 size=7092
21:46:45.790298595 0 nginx (25919) > recvfrom fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) size=1024
21:46:45.790309355 0 httpd (17039) > read fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=8000
21:47:50.791539134 0 nginx (25919) > close fd=19(<4t>172.21.0.8:35836->172.21.0.9:80)
21:48:04.525366305 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=882
21:48:04.525422762 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=7304273
21:48:04.525467926 2 httpd (17039) > close fd=13(<4t>172.21.0.8:35836->172.21.0.9:80)
</pre>
</div>
<p>So two things to learn from this:</p>
<ul>
<li><p class="first">Nginx suddenly decides to close connection after 65 seconds. Note that it closes connection before Apache tries to send the
request.</p>
<div class="term docutils container">
<pre class="literal-block">
21:46:45.790298595 0 nginx (25919) > recvfrom fd=19(<4t>172.21.0.8:35836->172.21.0.9:80) size=1024
21:47:50.791539134 0 nginx (25919) > close fd=19(<4t>172.21.0.8:35836->172.21.0.9:80)
21:48:04.525366305 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=882
21:48:04.525422762 2 httpd (17039) > writev fd=13(<4t>172.21.0.8:35836->172.21.0.9:80) size=7304273
21:48:04.525467926 2 httpd (17039) > close fd=13(<4t>172.21.0.8:35836->172.21.0.9:80)
</pre>
</div>
<p>Apache will do two writes before realising connection is busted, something is funny there (looks like a bug, or
some unrelenting optimization).</p>
</li>
<li><p class="first">Apache is <a class="reference external" href="https://tools.ietf.org/html/rfc2616#section-8.1.2.2">pipelining</a> the proxied requests (note the <tt class="docutils literal">sendfile</tt>
for static files). Basically multiple HTTP requests and responses are sent/received on the same connection - an often tricky
feature of HTTP 1.1.</p>
</li>
</ul>
<p>So now, a wtf-moment.</p>
<p>In a situation like this the only way forward is to peel layers from the onion: <a class="reference external" href="https://blog.ionelmc.ro/2016/02/18/notes-on-debugging/#rooting-it-out">remove code, disable components etc</a>.</p>
<p>Since <tt class="docutils literal">mod_ssl</tt> is the most annoying component around (we don't see any meaningful data in the sysdig logs because it's
encrypted) we could try to disable that first. Easy enough, just comment out the SSL stuff and leave the port:</p>
<pre class="literal-block">
<VirtualHost *:443>
# SSLEngine On
# SSLCertificateFile ...
# SSLCertificateKeyFile ...
...
</pre>
<p>Now we can send plain HTTP test requests to <tt class="docutils literal"><span class="pre">http://localhost:443/</span></tt>. It works why wouldn't it, and it also reproduces the
problem.</p>
<p>Digging again in the logs (grep for <tt class="docutils literal">POST /upload</tt>) yields:</p>
<div class="term docutils container">
<pre class="literal-block">
09:22:24.700375109 0 httpd (653) > read fd=10(<4t>10.0.2.2:57454->172.20.0.6:443) size=8000
09:22:24.700381289 0 httpd (653) < read res=1460 data=
POST /upload/ HTTP/1.1
Host: localhost:443
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://localhost:443/upload/
Content-Type: multipart/form-data; boundary=---------------------------8182195208398
Content-Length: 8243581
Cookie: csrftoken=...
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
-----------------------------8182195208398
...
150926 09:22:24.700635979 0 httpd (653) > read fd=10(<4t>10.0.2.2:57454->172.20.0.6:443) size=8000
150927 09:22:24.700638437 0 httpd (653) < read res=-11(EAGAIN) data=
150928 09:22:24.700642986 0 httpd (653) > read fd=10(<4t>10.0.2.2:57454->172.20.0.6:443) size=8000
150929 09:22:24.700644098 0 httpd (653) < read res=-11(EAGAIN) data=
150930 09:22:24.700645959 0 httpd (653) > poll fds=10:41 timeout=20875
</pre>
</div>
<p>Note the funny two subsequent reads failing with EAGAIN - another unrelenting optimization?</p>
<p>And some time later (depending on throttling settings) Apache decides it's time to send the headers:</p>
<div class="term docutils container">
<pre class="literal-block">
09:23:49.386532541 3 httpd (653) > writev fd=11(<4t>172.20.0.6:56980->172.20.0.5:80) size=1132
09:23:49.386624313 3 httpd (653) < writev res=1132 data=
POST /upload/ HTTP/1.1
Host: web
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://localhost:443/upload/
Content-Type: multipart/form-data; boundary=---------------------------8182195208398
Cookie: csrftoken=...
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
Proxy-Secret: 17610e1815bed35364ea65a8c723e10101b64e0c2e824fca6f67eb120b7b7fc6
X-Forwarded-For: 10.0.2.2
X-Forwarded-Host: localhost:443
X-Forwarded-Server: localhost
09:23:49.386660855 3 httpd (653) > writev fd=11(<4t>172.20.0.6:56980->172.20.0.5:80) size=8243632
09:23:49.386664671 3 httpd (653) < writev res=-32(EPIPE) data=
</pre>
</div>
<p>So basically what happens here is that out keepalive connection times out. Turns out there was a Nginx setting for this, the
default being:</p>
<pre class="literal-block">
keepalive_timeout 75s;
</pre>
<p>Now you don't really want to increase that, you'd end up with lots of tied up sockets and fds in dead connections.</p>
<p>So the mystery remains: why does Apache only send out the request after buffering the whole request body?</p>
<p>Unfortunately this is only clarified by looking at the not exactly short <a class="reference external" href="https://httpd.apache.org/docs/2.4/mod/mod_proxy.html#request-bodies">mod_proxy documentation</a>. The fix ...</p>
<pre class="literal-block">
SetEnv proxy-sendchunked
</pre>
<p>And now I wasted your time too 👿</p>
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td><p class="first">Pro tip: use <cite>rg</cite> instead of <cite>grep</cite>. You probably know the impressive silversearcher (<cite>ag</cite>) but
<a class="reference external" href="https://github.com/BurntSushi/ripgrep">https://github.com/BurntSushi/ripgrep</a> buries everything else.</p>
<p class="last">However, having the output already in a smallish file I've just used <tt class="docutils literal">less</tt>. Just press <tt class="docutils literal">/</tt> (search) or <tt class="docutils literal">></tt> (jump
to end) and <tt class="docutils literal">?</tt> (search backwards) and fire way your regex.</p>
</td></tr>
</tbody>
</table>
Rehashing the src layout2017-09-25T00:00:00+03:002017-11-20T00:00:00+02:00Ionel Cristian Mărieștag:blog.ionelmc.ro,2017-09-25:/2017/09/25/rehashing-the-src-layout/<p>Looking back at my <a class="reference external" href="https://blog.ionelmc.ro/2014/05/25/python-packaging/">Python packaging</a> post, now it got
<a class="reference external" href="https://hynek.me/articles/testing-packaging/#src">more</a>
<a class="reference external" href="https://docs.pytest.org/en/latest/goodpractices.html?highlight=src#choosing-a-test-layout-import-rules">popular</a>.
Even <a class="reference external" href="https://github.com/twisted/twisted">Twisted</a> is using it now but
there's still confusion about it.</p>
<p>I mean, places that you'd expect would use a <tt class="docutils literal">src</tt>-layout don't. Take <a class="reference external" href="http://flit.readthedocs.io/">flit</a> for instance, I
wrote <a class="reference external" href="https://blog.ionelmc.ro/2015/02/24/the-problem-with-packaging-in-python/">a rant on packaging</a> almost 1 year later …</p><p>Looking back at my <a class="reference external" href="https://blog.ionelmc.ro/2014/05/25/python-packaging/">Python packaging</a> post, now it got
<a class="reference external" href="https://hynek.me/articles/testing-packaging/#src">more</a>
<a class="reference external" href="https://docs.pytest.org/en/latest/goodpractices.html?highlight=src#choosing-a-test-layout-import-rules">popular</a>.
Even <a class="reference external" href="https://github.com/twisted/twisted">Twisted</a> is using it now but
there's still confusion about it.</p>
<p>I mean, places that you'd expect would use a <tt class="docutils literal">src</tt>-layout don't. Take <a class="reference external" href="http://flit.readthedocs.io/">flit</a> for instance, I
wrote <a class="reference external" href="https://blog.ionelmc.ro/2015/02/24/the-problem-with-packaging-in-python/">a rant on packaging</a> almost 1 year later, and
that <a class="reference external" href="https://www.reddit.com/r/Python/comments/3pasan/the_problem_with_packaging_in_python/cw51tjy/">prompted</a> Thomas
Kluyver to make flit but strangely enough it uses the ad-hoc layout <a class="footnote-reference" href="#footnote-1" id="footnote-reference-1">[1]</a>.</p>
<p>And there are plenty of guides on the Internet with the ol' ad-hoc layout.</p>
<blockquote class="highlights">
So people are wondering: "I don't understand this packaging mumbo-jumbo, what should I use?"</blockquote>
<p>It's not that complicated, just have these thoughts in mind when choosing:</p>
<ul>
<li><p class="first">If you use a project template (e.g.: <a class="reference external" href="https://cookiecutter.readthedocs.io/en/latest/readme.html#python">cookiecutter</a>) the
choice of layout is not so important. The template should solve all the annoying packaging problems.</p>
<p>You should use a template anyway.</p>
</li>
<li><p class="first">If you already have a project using the ad-hoc layout, and you don't have problems with testing or packaging then there's
little value in switching over to a <tt class="docutils literal">src</tt>-layout.</p>
</li>
<li><p class="first">The ad-hoc layout is more convenient and that's why there's high resistance to <tt class="docutils literal">src</tt>-layouts.</p>
<p>You need tooling to use the <tt class="docutils literal">src</tt>-layout effectively. If you don't use a tool to manage virtualenvs and install your
project like <a class="reference external" href="https://tox.readthedocs.io/">Tox</a> then it's going to be painful.</p>
</li>
<li><p class="first">Any layout will work. Mistakes can be corrected.</p>
<p>If <a class="reference external" href="https://blog.ionelmc.ro/2014/05/25/python-packaging/">this</a> don't make
sense (I agree it's a bit heavy on jargon and technicalities) and you don't want to use a template then perhaps you're
looking to learn packaging the hard way. Just flip a coin and start with something.</p>
</li>
</ul>
<table class="docutils footnote" frame="void" id="footnote-1" rules="none">
<colgroup><col class="label" /><col /></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="#footnote-reference-1">[1]</a></td><td><p class="first">Now people have already asked for flit to support the <tt class="docutils literal">src</tt>-layout. I have even added my thoughts on
<a class="reference external" href="https://github.com/takluyver/flit/issues/115">the issue</a>.</p>
<p class="last">Don't go over there and spam the issue with comments, I don't think there's anything new to add.</p>
</td></tr>
</tbody>
</table>