AI (Artificial Intelligence aka Imitation Intelligence) aka Slop

Things are moving fast, Stable Diffusion, ChatGPT etc.

Slop is the new name for unwanted AI-generated content https://simonwillison.net/2024/May/8/slop/

Criticism

AI will have eaten all our hobbies long before it fired us from our job. -- The End of Writing

Gemini Nano in Chrome

API is WIP and the following links might be outdated!

2024-11-12 Getting started with window.ai in Chrome Canary — in browser Gemini LLM – OUseful.Info, the blog…

explainers-by-googlers/prompt-api: A proposal for a web API for prompting browser-provided language models

Getting Started: window.ai in Chrome | by Chris McKenzie | Medium

OpenAI API

2024-05-20 - they changed a lot in billing and API key setup (needs projects and organisations and stuff)

When using the OpenAI API, user input doesn't end up in OpenAI training data (https://openai.com/policies/api-data-usage-policies). This is different from ChatGPT, where user input might be used for training the model!

Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

  • OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.
  • Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

Perplexity AI

https://www.perplexity.ai

Better at knowledge retrival and less a content creator.

Anthropic AI Claude

Chat: https://claude.ai/ Developer: https://console.anthropic.com/dashboard

OpenRouter

A unified interface for LLMs - Find the best models & prices for your prompts

https://openrouter.ai/

MS Copilot

Use it in Edge for work related Office 365 stuff. That seems to be the most appropriate.

Obsidian and Open AI

Smart Connections Plugin

Smart Connections is an AI-powered plugin designed to democratize access to AI technology and empower individuals with innovative tools. It features features like Smart View and Smart Chat, making it easier than ever to stay organized and uncover hidden connections between notes.

Smart Connection uses OpenAI API Embeddings for search and make connections

VSCode

Sourcegraph Cody

Using this right now (2024-06-14) and it got better. Using Claude 3 and GPT-4o is alright and it's a bit cheaper than other services. Chat is pretty good. New context source too.

Phind

Started as ChatGPT interface for coding questions, became a VSCode extension to interact with the chat so it's no autocomplete and more a glorified chat interface with some sort of context awareness (referencing open files in the project). They have a usage limit for their GPT-4 model and they have their own model. And I have to say - this came up with the best solutions so far if I used it as a rubberduck.

https://www.phind.com/

Rubberduck

Just an integration of Open AI API, fine-tuned for coding, covering the usual tasks like documenting, explaining, chat and generating code. The latter all without context or only the limited context you can sent over. Therefore, it's a rubberduck to bounce of ideas in the chat and and getting some extra help here and there to write self-containing code snippets.

Github Copilot

Can't login for some reason.

Tab 9

Unlimited free version with very short autocompletion and 15 USD with more complex autocompletion and chat and all that jazz.

  • Used the paid version for a bit. Autocomplete wasn't overwhelming, often accurate, but no mind reading here either.
  • The chat was quite disappointing and often very basic answers. I expected more since Tab 9 is indexing the codebase. Maybe I used it wrong!
  • Nevertheless offers Tab 9 some good things and privacy for the code

Rift

Not tested yet

https://github.com/morph-labs/rift

Continue

Not tested yet

https://github.com/continuedev/continue

Code GPT

Not tested, probably never will

https://www.codegpt.co/

Prompts

Cleanup OCR

async def process_chunk(chunk: str, prev_context: str, chunk_index: int, total_chunks: int, reformat_as_markdown: bool, suppress_headers_and_page_numbers: bool) -> Tuple[str, str]:
    logging.info(f"Processing chunk {chunk_index + 1}/{total_chunks} (length: {len(chunk):,} characters)")

    # Step 1: OCR Correction
    ocr_correction_prompt = f"""Correct OCR-induced errors in the text, ensuring it flows coherently with the previous context. Follow these guidelines:

1. Fix OCR-induced typos and errors:
   - Correct words split across line breaks
   - Fix common OCR errors (e.g., 'rn' misread as 'm')
   - Use context and common sense to correct errors
   - Only fix clear errors, don't alter the content unnecessarily
   - Do not add extra periods or any unnecessary punctuation

2. Maintain original structure:
   - Keep all headings and subheadings intact

3. Preserve original content:
   - Keep all important information from the original text
   - Do not add any new information not present in the original text
   - Remove unnecessary line breaks within sentences or paragraphs
   - Maintain paragraph breaks

4. Maintain coherence:
   - Ensure the content connects smoothly with the previous context
   - Handle text that starts or ends mid-sentence appropriately

IMPORTANT: Respond ONLY with the corrected text. Preserve all original formatting, including line breaks. Do not include any introduction, explanation, or metadata.

Previous context:
{prev_context[-500:]}

Current chunk to process:
{chunk}

Corrected text:
"""

    ocr_corrected_chunk = await generate_completion(ocr_correction_prompt, max_tokens=len(chunk) + 500)

    processed_chunk = ocr_corrected_chunk

    # Step 2: Markdown Formatting (if requested)
    if reformat_as_markdown:
        markdown_prompt = f"""Reformat the following text as markdown, improving readability while preserving the original structure. Follow these guidelines:
1. Preserve all original headings, converting them to appropriate markdown heading levels (# for main titles, ## for subtitles, etc.)
   - Ensure each heading is on its own line
   - Add a blank line before and after each heading
2. Maintain the original paragraph structure. Remove all breaks within a word that should be a single word (for example, "cor- rect" should be "correct")
3. Format lists properly (unordered or ordered) if they exist in the original text
4. Use emphasis (*italic*) and strong emphasis (**bold**) where appropriate, based on the original formatting
5. Preserve all original content and meaning
6. Do not add any extra punctuation or modify the existing punctuation
7. Remove any spuriously inserted introductory text such as "Here is the corrected text:" that may have been added by the LLM and which is obviously not part of the original text.
8. Remove any obviously duplicated content that appears to have been accidentally included twice. Follow these strict guidelines:
   - Remove only exact or near-exact repeated paragraphs or sections within the main chunk.
   - Consider the context (before and after the main chunk) to identify duplicates that span chunk boundaries.
   - Do not remove content that is simply similar but conveys different information.
   - Preserve all unique content, even if it seems redundant.
   - Ensure the text flows smoothly after removal.
   - Do not add any new content or explanations.
   - If no obvious duplicates are found, return the main chunk unchanged.
9. {"Identify but do not remove headers, footers, or page numbers. Instead, format them distinctly, e.g., as blockquotes." if not suppress_headers_and_page_numbers else "Carefully remove headers, footers, and page numbers while preserving all other content."}

Text to reformat:

{ocr_corrected_chunk}

Reformatted markdown:
"""
        processed_chunk = await generate_completion(markdown_prompt, max_tokens=len(ocr_corrected_chunk) + 500)
    new_context = processed_chunk[-1000:]  # Use the last 1000 characters as context for the next chunk
    logging.info(f"Chunk {chunk_index + 1}/{total_chunks} processed. Output length: {len(processed_chunk):,} characters")
    return processed_chunk, new_context

https://github.com/Dicklesworthstone/llm_aided_ocr/blob/main/llm_aided_ocr.py