A Coding Implementation for an Agentic AI Framework that Performs Literature Analysis, Hypothesis Generation, Experimental Planning, Simulation, and Scientific Reporting

In this tutorial, we build a complete scientific discovery agent step-by-step and examine how each component works together to form a cohesive research workflow. We start by loading our literature collection, creating the retrieval and LLM modules, and then assembling agents that search papers, generate hypotheses, design experiments, and produce structured reports. Through the excerpts below, we see how an agent pipeline emerges naturally, allowing us to explore a scientific question from initial curiosity to full analysis within a single integrated system. verify Full codes here.

import sys, subprocess


def install_deps():
   pkgs = ("transformers", "scikit-learn", "numpy")
   subprocess.check_call((sys.executable, "-m", "pip", "install", "-q") + pkgs)


try:
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np
except ImportError:
   install_deps()
   from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
   from sklearn.feature_extraction.text import TfidfVectorizer
   from sklearn.metrics.pairwise import cosine_similarity
   import numpy as np


from dataclasses import dataclass
from typing import List, Dict, Any


np.random.seed(42)


LITERATURE = (
   {"id": "P1","title": "Self-Supervised Protein Language Models for Structure Prediction","field": "computational biology",
    "abstract": "We explore transformer-based protein language models trained on millions of sequences. The models learn residue-level embeddings that improve secondary structure prediction and stability estimation."},
   {"id": "P2","title": "CRISPR Off-Target Detection Using Deep Learning","field": "genome editing",
    "abstract": "We propose a convolutional neural network architecture for predicting CRISPR-Cas9 off-target effects directly from genomic sequences, achieving state-of-the-art accuracy on GUIDE-seq datasets."},
   {"id": "P3","title": "Foundation Models for Scientific Equation Discovery","field": "scientific ML",
    "abstract": "Large language models are combined with symbolic regression to recover governing equations from noisy experimental observations in physics and fluid dynamics."},
   {"id": "P4","title": "Active Learning for Materials Property Optimization","field": "materials science",
    "abstract": "We integrate Bayesian optimization with graph neural networks to actively select candidate materials that maximize target properties while reducing experimental cost."},
   {"id": "P5","title": "Graph-Based Retrieval for Cross-Domain Literature Review","field": "NLP for science",
    "abstract": "We construct a heterogeneous citation and concept graph over multi-domain scientific papers and show that graph-aware retrieval improves cross-domain literature exploration."},
)


corpus_texts = (p("abstract") + " " + p("title") for p in LITERATURE)
vectorizer = TfidfVectorizer(stop_words="english")
corpus_matrix = vectorizer.fit_transform(corpus_texts)


MODEL_NAME = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)


def generate_text(prompt: str, max_new_tokens: int = 256) -> str:
   inputs = tokenizer(prompt, return_tensors="pt", truncation=True)
   outputs = model.generate(**inputs, max_new_tokens=max_new_tokens, num_beams=4, early_stopping=True)
   return tokenizer.decode(outputs(0), skip_special_tokens=True)

We’ve laid the foundation for our scientific agent by loading the libraries, setting up the literature collection, and configuring our language model. We build a TF-IDF vector and merge all abstracts to retrieve relevant papers later. After loading the model and organizing the data, we create the computational backbone for everything that follows. verify Full codes here.

@dataclass
class PaperHit:
   paper: Dict(str, Any)
   score: float


class LiteratureAgent:
   def __init__(self, vectorizer, corpus_matrix, papers: List(Dict(str, Any))):
       self.vectorizer = vectorizer
       self.corpus_matrix = corpus_matrix
       self.papers = papers


   def search(self, query: str, k: int = 3) -> List(PaperHit):
       q_vec = self.vectorizer.transform((query))
       sims = cosine_similarity(q_vec, self.corpus_matrix)(0)
       idxs = np.argsort(-sims)(:k)
       hits = (PaperHit(self.papers(i), float(sims(i))) for i in idxs)
       return hits

We carry out the literary research component for our agent. We transform user queries into a vector space and identify the most relevant scientific papers using cosine similarity. By doing this, we give our system the ability to base its reasoning in the most closely matched previous work. verify Full codes here.

@dataclass
class ExperimentPlan:
   system: str
   hypothesis: str
   variables: Dict(str, Any)
   protocol: List(str)


@dataclass
class ExperimentResult:
   plan: ExperimentPlan
   metrics: Dict(str, float)


class ExperimentAgent:
   def design_experiment(self, question: str, hypothesis: str, hits: List(PaperHit)) -> ExperimentPlan:
       top_field = hits(0).paper("field") if hits else "computational science"
       protocol = (
           f"Construct dataset combining ideas from: {', '.join(h.paper('id') for h in hits)}.",
           "Split data into train/validation/test.",
           "Compare baseline model vs. augmented model implementing the hypothesis.",
           "Evaluate using appropriate metrics and perform ablation analysis.",
       )
       variables = {
           "baseline_model": "sequence CNN",
           "augmented_model": "protein language model + CNN",
           "n_train_samples": 5000,
           "n_validation_samples": 1000,
           "metric": "AUROC",
       }
       system = f"{top_field} system related to: {question}"
       return ExperimentPlan(system=system, hypothesis=hypothesis, variables=variables, protocol=protocol)


   def run_experiment(self, plan: ExperimentPlan) -> ExperimentResult:
       base = 0.78 + 0.02 * np.random.randn()
       gain = abs(0.05 + 0.01 * np.random.randn())
       metrics = {
           "baseline_AUROC": round(base, 3),
           "augmented_AUROC": round(base + gain, 3),
           "estimated_gain": round(gain, 3),
       }
       return ExperimentResult(plan=plan, metrics=metrics)

We design and simulate experiments based on the retrieved literature and generated hypothesis. We automatically define variables, create a protocol, and create synthetic metrics that mimic the dynamics of real scientific evaluation. This allows us to move from theoretical ideas to an executable pilot plan. verify Full codes here.

class ReportAgent:
   def write_report(self, question: str, hits: List(PaperHit), plan: ExperimentPlan, result: ExperimentResult) -> str:
       related_work = "n".join(f"- {h.paper('title')} ({h.paper('field')})" for h in hits)
       protocol_str = "n".join(f"- {step}" for step in plan.protocol)
       prompt = f"""
You are an AI research assistant writing a concise research-style report.


Research question:
{question}


Hypothesis:
{plan.hypothesis}


Relevant prior work:
{related_work}


Planned experiment:
System: {plan.system}
Variables: {plan.variables}
Protocol:
{protocol_str}


Simulated results:
{result.metrics}


Write a clear report with the following sections:
1. Background
2. Proposed Approach
3. Experimental Setup
4. Results and Discussion
5. Limitations and Future Work
"""
       return generate_text(prompt.strip(), max_new_tokens=320)

We create a full research style report using LLM. We compile the hypothesis, protocol, results, and related work into a structured document with clearly defined sections. This allows us to transform the raw outputs of the pipeline into polished scientific communication. verify Full codes here.

class ScientificAgent:
   def __init__(self):
       self.lit_agent = LiteratureAgent(vectorizer, corpus_matrix, LITERATURE)
       self.exp_agent = ExperimentAgent()
       self.report_agent = ReportAgent()


   def propose_hypothesis(self, question: str, hits: List(PaperHit)) -> str:
       context = " ".join(h.paper("abstract") for h in hits)
       prompt = f"""
You are an AI scientist. Given a research question and related abstracts,
propose a single, testable hypothesis in 2-3 sentences.


Research question:
{question}


Related abstracts:
{context}
"""
       return generate_text(prompt.strip(), max_new_tokens=96)


   def run_pipeline(self, question: str) -> str:
       hits = self.lit_agent.search(question, k=3)
       hypothesis = self.propose_hypothesis(question, hits)
       plan = self.exp_agent.design_experiment(question, hypothesis, hits)
       result = self.exp_agent.run_experiment(plan)
       report = self.report_agent.write_report(question, hits, plan, result)
       return report


if __name__ == "__main__":
   research_question = (
       "How can protein language model embeddings improve CRISPR off-target "
       "prediction compared to sequence-only CNN baselines?"
   )
   agent = ScientificAgent()
   final_report = agent.run_pipeline(research_question)
   print(final_report)

We coordinate the entire pipeline, search the literature, generate a hypothesis, design the experiment, run the simulation, and write the report. We then implement the system on a real research question and observe the full workflow in action. This step brings all the modules together into a unified scientific agent.

In conclusion, we see how the combined code base can evolve into an effective AI co-researcher capable of searching, inferring, simulating, and summarizing. We understand how each extract contributes to the full pathway and how the agent components amplify each other when combined. We also position ourselves in a strong position to provide the worker with richer literary sources, more realistic models, and more sophisticated empirical reasoning, pushing our scientific exploration further with each iteration.

verify Full codes here. Feel free to check out our website GitHub page for tutorials, codes, and notebooks. Also, feel free to follow us on twitter Don’t forget to join us 100k+ mil SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of AI for social good. His most recent endeavor is the launch of the AI media platform, Marktechpost, which features in-depth coverage of machine learning and deep learning news that is technically sound and easy to understand by a broad audience. The platform has more than 2 million views per month, which shows its popularity among the masses.

🙌 FOLLOW MARKTECHPOST: Add us as a favorite source on Google.