05 Jan 2025
Introduction
In this post, we will build a simple DOS .EXE file from scratch. We will use
the NASM assembler and the OpenWatcom C compiler to create a simple program
that prints “Hello, World!” to the console.
This post is part of series where by I am trying to recreate the environment
that I used in 1990s - stay tuned for more posts on this topic.
Prerequisites
You will need the following tools to follow along with this post:
- NASM assembler
- OpenWatcom C compiler (wlink)
- FreeDOS (virtual BOX)
- A text editor (FED)
We’ll use NASM to write the assembly code for our program, OpenWatcom to link
the assembly code with the C runtime library, and FreeDOS to run the resulting
.EXE file. (this was not as easily done in 1992, or 2025)
Setting up the environment
The above software is available on the FreeDOS bonus CD, which you can download
from the FreeDOS website. You can install the addition softwareusing FDIMPLES
or FDUPDATE
commands in FreeDOS. (nice and easy)
Building the program
First, create a new file called hello.asm
and add the following code:
section .text
global _start
_start:
mov ah, 9
mov dx, msg
int 21h
mov ah, 4ch
int 21h
section .data
msg db 'Hello, World!', 0
NOTE: ChatGPT was terrible at this, I had to fix the code, compile and linking
commands to get it to work. Most of the time it will build the code, but not
function correctly. With misaligned data segments and the like, it was a hot mess.
Next we need to compile the assembly code using NASM:
the we need to link the object file with the OpenWatcom C runtime library:
wlink sys dos name hello.exe start=_start file hello.obj
Breakng this down a bit, the sys dos
option tells the linker to create a
DOS executable file. The name hello.exe
option specifies the name of the
output file. The start=_start
option tells the linker to use the _start
symbol as the entry point of the program. The file hello.obj
option tells
the linker to link the hello.obj
file with the C runtime library.
Running the program
To run the program, simply run it from the command line. You should see the
message “Hello, World!” printed to the console.
Conclusion
In this post, we built a simple DOS .EXE file from scratch using the NASM
assembler and the OpenWatcom C compiler. We used FreeDOS to run the resulting
.EXE file and saw the message “Hello, World!” printed to the console. This
demonstrates how to create a simple DOS program using tools that were commonly
used in 1992.
I hope you found this post helpful. If you have any questions or comments,
please feel free to contact me.
19 Aug 2023
Processing and transforming large text datasets efficiently is a common challenge in various data processing pipelines, ETL workflows, and text analysis applications. To address this challenge, the Godspeed IO project offers a powerful solution that prioritizes memory efficiency while providing an easy-to-use interface for developers of all skill levels. In this blog post, we’ll dive into the key features of Godspeed IO and provide a step-by-step example of how to use it to process large text files.
Introducing Godspeed IO
Godspeed IO is a memory-efficient stream processing library for Python that’s designed to tackle the challenges posed by processing large text datasets. Whether you’re dealing with real-time data streams, massive text files, or scenarios where memory consumption is a concern, Godspeed IO has you covered.
Key Features
-
Memory Efficiency: One of the primary focuses of Godspeed IO is memory efficiency. This makes it well-suited for processing large text datasets without overloading your system’s memory resources.
-
Stream Processing: The core functionality of Godspeed IO revolves around processing text streams. Instead of loading the entire content into memory, you can process the data line by line or in chunks. This approach minimizes memory usage and allows you to handle datasets that would otherwise be too large to fit in memory.
-
Flexible Transformation: With Godspeed IO, you can define custom transformation functions that process the text data as it flows through the system. This flexibility allows you to perform various operations such as text manipulation, data extraction, and filtering.
-
User-Friendly API: The API provided by Godspeed IO is designed to be user-friendly, making it accessible to developers with varying levels of experience. The library’s intuitive design enables you to focus on the processing logic rather than dealing with complex memory management.
-
Integration: Godspeed IO seamlessly integrates into a wide range of data processing pipelines, ETL workflows, and text analysis applications. Its versatility allows you to incorporate it into your existing projects without major modifications.
Installation
Getting started with Godspeed IO is as simple as installing it using pip
. Just run the following command:
Once the library is installed, you’re ready to harness its power for processing large text datasets.
Example: Ensuring Equal Columns in a CSV File
To illustrate how Godspeed IO works, let’s walk through a practical example. Imagine you have a large CSV file where each row represents a record with varying numbers of columns. Your goal is to ensure that all rows have the same number of columns by padding them with separators if necessary.
The first step is to define a custom transformation function that takes a line of text as input and returns the transformed line. In this case, the function should ensure that each row has a specified width (number of columns). Here’s what the function might look like:
from godspeedio import processor
@processor(order=1)
def ensure_equal_columns(chunk, width=10, sep=","):
"""Ensure that all rows have the same number of columns"""
chunk = chunk.rstrip("\n")
if chunk.count(sep) < width:
chunk += sep * (width - chunk.count(sep)) + "\n"
return chunk
In this function, the @processor(order=1)
decorator indicates that this transformation should be applied first. The function takes three parameters: chunk
(a line of text), width
(desired number of columns), and sep
(separator used in the CSV file). It ensures that the line has the desired number of columns by adding separators as needed.
Step 2: Processing the Stream
Now that we have the transformation function, let’s see how to use Godspeed IO to process the text stream efficiently:
from godspeedio import godspeed
file_path = "large_file.csv" # Replace with your file path
# Open the file and process the stream using Godspeed IO
with open(file_path) as file:
with godspeed(file) as f:
for chunk in f:
# Process the transformed chunk here (post processing)
In this code snippet, we open the CSV file using the with
statement to ensure proper resource management. Inside the context, we use the godspeed
function to create a processing stream from the file object. The processing stream (f
in this case) allows us to iterate over the file’s content efficiently, processing each chunk according to the transformation function defined earlier.
Conclusion
Godspeed IO offers a memory-efficient and user-friendly solution for processing large text datasets in Python. Its stream processing approach, combined with custom transformation functions, allows you to tackle complex text data processing tasks without worrying about memory limitations. By breaking down the processing into smaller, manageable steps, you can easily manipulate and transform data in a way that’s both efficient and maintainable.
If you’re dealing with large text files, real-time data streams, or any scenario where memory efficiency is critical, consider integrating Godspeed IO into your projects. Its seamless integration, intuitive API, and flexibility make it a valuable tool in your data processing toolkit. To get started, install Godspeed IO using pip
and explore its capabilities firsthand.
09 Dec 2022
Bayes’ theorem is a mathematical formula used to determine the probability of an event based on prior knowledge of
conditions that might be related to the event. It is a widely used tool in statistics and probability theory.
The theorem is named after Thomas Bayes, a 18th century English statistician and Presbyterian minister, who developed a
method for calculating probabilities based on the concept of conditional probability. Conditional probability is the
probability of an event occurring given that another event has already occurred.
Laplace is credited with being the first to use Bayes’ theorem in the field of statistics. Before Laplace, Bayes’
theorem was primarily used in the field of theology to try to infer the existence and nature of God from observed
phenomena.
Laplace recognized the potential of Bayes’ theorem for solving problems in statistics and used it to develop a rigorous
approach to statistical inference. His work was influential in establishing Bayes’ theorem as a cornerstone of
statistical theory.
Laplace’s contribution to Bayes’ theorem was to recognize its potential for solving statistical problems and to develop
a mathematical framework for using it in statistical inference. He showed that Bayes’ theorem could be used to calculate
the probability of an event based on prior knowledge and observed data. This allowed him to develop a more objective and
rigorous approach to statistical inference than was previously possible.
Overall, Laplace’s work was instrumental in establishing Bayes’ theorem as an important tool in the field of statistics.
Bayes’ theorem is typically expressed as follows:
P(A|B) = P(B|A) * P(A) / P(B)
where P(A|B)
is the conditional probability of event A
occurring given that event B
has already occurred, P(B|A)
is the conditional probability of event B
occurring given that event A has already occurred, P(A)
is the probability
of event A occurring, and P(B)
is the probability of event B
occurring.
The theorem is used in a variety of applications, such as medical testing, where it can be used to calculate the
probability that a person has a certain disease based on the results of a test. It is also used in machine learning,
where it can be used to make predictions about future events based on past data.
One example of the use of Bayes’ theorem in medicine is in the context of HIV testing. HIV, or human immunodeficiency
virus, is a virus that can lead to acquired immunodeficiency syndrome (AIDS). It is a relatively rare disease, with an
estimated prevalence of 0.3% in the general population.
Suppose a person undergoes an HIV test and the test comes back positive. What is the probability that the person
actually has HIV? This is where Bayes’ theorem can be used.
In this case, event A is the person having HIV, and event B is the person testing positive for HIV. P(A)
is the
probability that a person has HIV, which is 0.3%. P(B|A)
is the probability that a person with HIV will test positive
for the disease, which is high (assuming the test is accurate). P(B)
is the probability that a person will test positive
for HIV, which includes both those with HIV (0.3%) and those without HIV who happen to test positive due to false
positives (a very small percentage).
Using Bayes’ theorem, we can calculate the probability that a person who tests positive for HIV actually has the
disease:
P(A|B) = P(B|A) * P(A) / P(B)
= (high) * 0.003 / (0.003 + very small)
= high
Thus, Bayes’ theorem can be used to calculate the probability that a person who tests positive for HIV actually has the
disease, taking into account both the accuracy of the test and the overall prevalence of HIV in the population.
Here is an example of Bayes’ theorem using NumPy. Keep in mind that this is just one possible way to implement Bayes’ theorem using NumPy and that there may be other ways to do it as well.
First, let’s define some variables that we will use in our implementation:
import numpy as np
# Prior probabilities of each hypothesis
prior = np.array([0.2, 0.3, 0.5])
# Likelihood of each evidence under each hypothesis
likelihood = np.array([
[0.5, 0.1, 0.9], # Evidence 1
[0.3, 0.5, 0.2], # Evidence 2
[0.2, 0.4, 0.3] # Evidence 3
])
# Evidence
evidence = np.array([1, 0, 1])
In this example, we have three hypotheses (represented by the prior array) and three pieces of evidence (represented by
the evidence array). The likelihood array represents the likelihood of each evidence under each hypothesis.
Next, we can use Bayes’ theorem to compute the posterior probabilities of each hypothesis given the evidence:
# Compute the posterior probabilities using Bayes' theorem
posterior = np.zeros_like(prior)
for i in range(len(prior)):
posterior[i] = prior[i] * np.prod(likelihood[:, i] ** evidence)
# Normalize the posterior probabilities
posterior /= np.sum(posterior)
print(posterior) # [0.08695652 0.15217391 0.76086957]
The resulting posterior array gives the probabilities of each hypothesis given the evidence. In this case, the highest
probability is for the third hypothesis, which means that it is the most likely given the evidence.
Overall, Bayes’ theorem is a valuable tool in probability and statistics. It allows for the calculation of probabilities
based on prior knowledge of related events, and can provide valuable insights into the likelihood of future events.