Node.js Byte and String

By Meabed on 2018-09-16 #nodejs #text #string #byte #binary #wsse #wsse-soap-header

In Summary:

  • Byte - Binary: a sequence of bytes it does not necessarily has a Unicode representation, to convert to Text with
  • String - Text: human-readable text ABCDEFGHabcdefgh0123456789-
  • Byte are in machine readable form internally, Strings are only in human readable form.
  • Since Byte objects are machine readable, they can be directly stored on the disk. Whereas, Strings need encoding before which they can be stored on disk.
A binary to Text Character looks like

A Byte:

A byte is 8 bits (binary data) ex: "01000010"

A byte is not guaranteed to be 8 bits. It's certainly the de facto standard of today but historically it's not always been the case. en.wikipedia.org/wiki/Byte

A byte array is an array of bytes []byte You could use a byte array to store a collection of binary data, for example:

  • the contents of a any file with any type, like .mp3 .dat etc...
  • Normal Unicode string abcdef12345
  • Image content like .jpeg .png etc...
  • Anything else that is not Text like Encrypted data - RSA - SAH ... etc... because encryption don't generate TEXT characters only

Why Binary and not String?

Unlike a character string which usually contains text data, a binary string is used to hold non-traditional data such as pictures. The length of a binary string is the number of bytes in the sequence.

A String:

A String is A Byte or an Array of Byte []Byte - []Byte("A"); []Byte("Æ"); []Byte["Ø"]

en.wikipedia.org/wiki/Character

You can convert and see Byte as string only if the Binary Code match to a Unicode Character otherwise conversion from Byte to String will result on corrupted characters

♥ = 3 Bytes | ASCII = 226 153 165 | Binary = 11100010 10011001 10100101
Ø = 2 Bytes | ASCII = 195 152 | Binary = 11000011 10011000
Æ = 2 Bytes | ASCII = 195 134 | Binary = 11000011 10000110
i = 1 Byte | ASCII = 105 | Binary = 01101001
u = 1 Byte | ASCII = 117 | Binary = 01110101

Unicode is a standard that specifies, amongst other things, what characters are available. UTF-8 is a character encoding that specifies how these characters shall be physically encoded in 1s and 0s. UTF-8 can use 1 byte for ASCII (<= 127) and up to 4 bytes to represent other Unicode characters.


Javascript is not very clear in data types, something confusing like

  • array vs object
  • string vs numeric
  • null vs undefined

One more addition to the list is: Byte vs String which are explained above.

So in Javascript - Node.js to generate a Hash Value such as ( MD5, SHA1, SHA2, etc... ) it return Array of Byte []Byte. You cannot use the returned Array of Byte in String Operations such " Comparison, Sending Via API, Storing in Disk as Text, etc... ", Instead you have to convert the []Byte to String data type using Buffer

import crypto from 'crypto';

let sha1Ex1 = crypto.createHash('sha1').update('string_test_1').digest('hex');
let sha1Ex2 = crypto.createHash('sha1').update('string_test_2').digest('hex');

let textBuff = Buffer.concat([
Buffer.from(sha1Ex1, 'hex'),
Buffer.from(sha1Ex1, 'hex'),
]);

let pwHash = crypto.createHash('sha1').update(textBuff).digest('binary');

const PasswordDigest = Buffer.from(pwHash, 'binary').toString('base64');
  • To Convert Hash / Byte to String you have to Buffer.from()
  • To Concat Bytes you have to Buffer.concat([Buffer.from(var1),Buffer.from(var2),Buffer.from(var3)])

A practical example of this is comparing MD5 Hash to the string value of the hash:

const crypto = require('crypto');

let encryptedDataByte = crypto
.createHmac('md5', 'secret key')
.update('data to encrypt')
.digest('hex'); // this is []Byte

// []Byte to String utf8 via Buffer 4323afd34f95da835aad5bfe86670c7e
const ByteToStringMD5Hash = Buffer.from(encryptedDataByte).toString('utf8');

Another example of this is implementing WS-Security Soap Header - WSSE Security headers:

const crypto = require('crypto');

const getWSSE = ({ username, password, nonce, created }) => {

  const Created = new Date().toISOString();
  const rawNonce = nonce || Created + '-' + Math.random().toString();
  const NonceB64 = Buffer.from(rawNonce).toString('base64');

  let passwordSha1 = crypto.createHash('sha1').update(password).digest('hex');

  let textBuff = Buffer.concat([
    Buffer.from(NonceB64, 'utf8'),
    Buffer.from(Created, 'utf8'),
    Buffer.from(passwordSha1, 'hex'),
  ]);

  let pwHash = crypto.createHash('sha1').update(textBuff).digest('binary');

  const PasswordDigest = Buffer.from(pwHash, 'binary').toString('base64');

  // free mem-leak
  passwordSha1 = null;
  textBuff = null;
  pwHash = null;

  return {
    Username: username,
    Password: PasswordDigest,
    RawNonce: rawNonce,
    NonceB64: NonceB64,
    CreatedAt: Created
  };
};

var token = getWSSE({ username: 'bob', password: 'taadtaadpstcsm' });
console.log(token)

// example output
{
    Username: 'bob',
    Password: 'Yn7seAYwKWaBSaevJ4dzx39mmNY=',
    RawNonce: '2018-09-27T11:07:13.748Z-0.4117718081719923',
    NonceB64: 'MjAxOC0wOS0yN1QxMTowNzoxMy43NDhaLTAuNDExNzcxODA4MTcxOTkyMw==',
    CreatedAt: '2018-09-27T11:07:13.748Z'
}

You could try it out on https://repl.it/@meabed/nodejs-wsse

Extra read:

...