Fast 6x12 Connected Components using bit-optimized BFS

1,656 views

Open Source Your Knowledge, Become a Contributor

Technology knowledge has to be shared and made accessible for free. Join the movement.

Fast 6x12 Connected Components using bit-optimized BFS

This is pretty standard BFS, just bit-optimized and with a lookup table for neighbours. Performance varies a lot depending on how densely populated the bitboards are, and that in turn depends a lot on the random seed they are initialized with. But you can expect 2-5 millions of iterations per 100ms, suitable for Smash the Code. Performance seems to be about same with and without global target set to sse+avx, but I do have an avx2 target locally around popcount code which improves performance by about 50%. For historical reasons I had some code that didn't optimize well with AVX on older gcc than 12, that's why I chose to enable it only for critical sections by default. Global default is without avx target. Enough talk, here's to the code:

#pragma GCC optimize("Ofast","unroll-loops","omit-frame-pointer","inline")
#pragma GCC option("arch=native","tune=native","no-zeroupper") //Enable AVX
// #pragma GCC target("avx2")
// #pragma GCC target("popcnt") //Enable popcount
// #pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,avx2") //Enable AVX
#include <x86intrin.h> //SSE Extensions
#include <bits/stdc++.h> //All main STD libraries
#include <cstdlib>
#include <fstream>
#include <iostream>
#include <sstream>
#include <chrono>
#include <bitset>
#define USE_AVX 1
typedef __uint128_t BOARD_T;
using namespace std;
using time_interval_t = std::chrono::microseconds;
using myClock = std::chrono::high_resolution_clock;
const unsigned int rowCount = 12;
const unsigned int colCount = 8;
const unsigned int realColCount = colCount;
const unsigned int twoRowsCellCount = 2 * colCount;
const unsigned int colorCount = 6;
const unsigned int cellCount = rowCount * colCount;
const unsigned int backgroundCount = 1;
const unsigned int maxComponents = backgroundCount + cellCount / 2;
const unsigned int maxComponentsPerRow = 3;
const unsigned int lookupTableSizeFor1Row = 1 << (1 * realColCount);
const unsigned int lookupTableSizeFor2Rows = 1 << (2 * realColCount);
union converter8
{
  uint8_t num;
  struct
  {
      uint8_t num8[1];
  } bytes;
};
inline converter8 initializeConverter(uint8_t num) {
  converter8 converter;
  converter.num = num;
  return converter;
}
union converter16

Open Source Your Knowledge: become a Contributor and help others learn. Create New Content

Open Source Your Knowledge, Become a Contributor

Fast 6x12 Connected Components using bit-optimized BFS

Fast Connected Components for 6x12 bitboard

How to randomize bits and integers efficiently in C++

BFS and Voronoi diagrams using bit-shift operations

Optimizing breadth first search