• Uncategorized
  • 0

Inwards a better CUDA-based scrypt miner

Inside a better CUDA-based scrypt miner

A blog about laptop science, systems, technology, and sometimes vegan cooking.

Subscribe to this blog

Go after by Email

Inwards a better CUDA-based scrypt miner

  • Get verbinding
  • Facebook
  • Twitter
  • Pinterest
  • Google+
  • Email
  • Other Apps

for i := 0, i <, vecsize, i++ <

  1. Don’t use too many variables (registers) vanaf thread (vector entry). The GPU has a large but still limited number. Using more than 32 or 64 variables vanaf thread can embark to slow things down by restricting the number of threads that can execute concurrently.
  2. Don’t use too much memory bandwidth. The GPU has a lotsbestemming — from 80 to 300 GB/sec — but it’s not infinite.

  1. Use your memory bandwidth well: If each thread reads a totally random location at a time, your code will be slow. If, instead, most threads read adjacent locations so that the overall read is a big sequential one to memory, you will get a loterijlot of bandwidth.
  • Keep the GPU busy while waiting on the host to do things.
  • Let’s see how wij can tackle all of thesis. I’ll commence with the third very first: Keep the GPU busy. CUDA provides a loterijlot of mechanisms for, e.g., overlapping copies te &, out of the GPU and running the parallel code (called a “kernel”). This is a good idea, and it’s what Cudaminer did. I determined not to, and instead, I moved all of the mining functionality into the GPU so that I can invoke it, have it run for a long time, and then very quickly invoke it again with the next (petite) job description. This required implementing more stuff on the GPU, but simplified the architecture.

    // hashed_key is 1024 snauwerig of state gegevens.

    State[i] = hashed_key // State is 128KB.

    which_state = low order Ten vinnig of hashed_key

    hashed_key = XOR(State[which_state], hashed_key)

    final_key := PBKDF2_SHA256_finish(out_key, saved_keystate)

    // This loop is usually done for many nonces ter parallel

    for nonce := 0, nonce <, max_nonce &,&, !found, nonce++ <

    hash := do_scrypt(coin_data, nonce),

    if hash <, target <

    printf(“Found a hash collision for nonce %d\n”, rc),

    1. One key vanaf core.
    2. Spread one key’s work across numerous cores.

    The advantage to option #1 is that it’s ordinary. The existing CudaMiner code, for example, goes after this pattern. You can almost literally take ordinary CPU-written code and copy it into CUDA and it will work. This is exactly how I did PBKDF2. It’s not amazing, but it’s swift enough.

    Very first, mix within each katern. Example from katern 1:

    Then mix within each row. Example from row 1:

    Related movie: hashflare vk


    You may also like...

    Leave a Reply

    Your email address will not be published. Required fields are marked *