[01:06] *** Stiletto has joined #archiveteam-bs
[01:08] *** Stilett0 has quit IRC (Read error: Operation timed out)
[01:13] *** Stilett0 has joined #archiveteam-bs
[01:17] *** Stiletto has quit IRC (Read error: Operation timed out)
[01:33] *** ndiddy has quit IRC (Read error: Operation timed out)
[01:41] *** ndiddy has joined #archiveteam-bs
[01:54] <robogoat> bithippo: Might also look at https://hackage.haskell.org/package/github-backup
[01:54] <bithippo> :thumbs up:
[02:09] *** JAA_ has joined #archiveteam-bs
[02:16] *** JAA_ has quit IRC (leaving)
[02:19] *** JAA_ has joined #archiveteam-bs
[02:19] *** JAA sets mode: +o JAA_
[02:22] *** JAA has quit IRC (leaving)
[02:22] *** JAA_ is now known as JAA
[02:36] *** ndiddy has quit IRC (Ping timeout: 492 seconds)
[03:10] *** jacketcha has joined #archiveteam-bs
[03:31] <JAA> https://twitter.com/textfiles/status/970448544271233024 lol, looks like saying "guys, don't archive this!" is a good way of finding new archivists for that particular content niche. We should keep that in mind. ;-)
[03:48] <Despatche> you wouldn't download a website
[04:12] *** qw3rty113 has joined #archiveteam-bs
[04:14] <superkuh> It sure is great how twitter auto-detects when JS is disabled and redirects you to a version of their site that hides all the post content. Super usable.
[04:18] *** qw3rty112 has quit IRC (Read error: Operation timed out)
[05:05] *** dashcloud has quit IRC (Read error: Operation timed out)
[05:08] *** dashcloud has joined #archiveteam-bs
[06:10] *** odemg has quit IRC (Read error: Operation timed out)
[06:21] *** odemg has joined #archiveteam-bs
[06:35] *** Pixi` has quit IRC (Quit: Pixi`)
[06:35] *** Pixi has joined #archiveteam-bs
[06:59] *** wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
[07:06] *** odemg has quit IRC (Read error: Connection reset by peer)
[07:08] *** dashcloud has quit IRC (Read error: Operation timed out)
[07:22] *** odemg has joined #archiveteam-bs
[07:23] *** h3x has quit IRC (Read error: Operation timed out)
[07:24] *** h3x has joined #archiveteam-bs
[07:37] *** schbirid has joined #archiveteam-bs
[07:47] *** odemg has quit IRC (Ping timeout: 252 seconds)
[08:00] *** odemg has joined #archiveteam-bs
[10:21] *** BlueMax has quit IRC (Read error: Connection reset by peer)
[10:33] *** Mateon1 has quit IRC (Read error: Operation timed out)
[10:33] *** Mateon1 has joined #archiveteam-bs
[11:06] *** odemg has quit IRC (Read error: Operation timed out)
[11:14] *** wp494 has joined #archiveteam-bs
[11:20] *** odemg has joined #archiveteam-bs
[11:55] *** odemg has quit IRC (Read error: Connection reset by peer)
[12:15] *** odemg has joined #archiveteam-bs
[13:21] *** odemg has quit IRC (Read error: Connection reset by peer)
[13:36] *** odemg has joined #archiveteam-bs
[15:25] *** odemg has quit IRC (Read error: Connection reset by peer)
[15:52] *** odemg has joined #archiveteam-bs
[15:52] *** odemg has quit IRC (Connection closed)
[15:53] *** Dimtree has quit IRC (Peace)
[15:56] *** odemg has joined #archiveteam-bs
[15:56] *** odemg has quit IRC (Connection closed)
[15:56] *** Dimtree has joined #archiveteam-bs
[15:59] *** odemg has joined #archiveteam-bs
[16:07] *** odemg has quit IRC (Read error: Connection reset by peer)
[16:18] *** odemg has joined #archiveteam-bs
[17:30] *** ld1 has quit IRC (Read error: Connection reset by peer)
[17:31] *** ld1 has joined #archiveteam-bs
[17:46] *** Dimtree has quit IRC (Peace)
[17:51] *** Dimtree has joined #archiveteam-bs
[17:59] *** Pixi has quit IRC (Quit: Pixi)
[18:03] *** jschwart has joined #archiveteam-bs
[18:09] <SketchCow> JAA: Running
[18:09] <JAA> Thanks
[18:10] <SketchCow> 599c8fa2311eff5cfc358407fd262642e3db4034af6e10241e5af28a392e536f  charlierose.com-videos-00000.warc.gz
[18:11] <JAA> Sweet, thank you. :-)
[18:14] <SketchCow> Do I delete this now
[18:17] <JAA> No
[18:17] <JAA> But it means that I can delete WARC 00000 on my machine and continue grabbing.
[18:18] <JAA> And upload WARCs 1 through 43 and then delete those as well, etc.
[18:18] <JAA> I'll let you know when it's done and ready for transfer to IA.
[18:18] <JAA> Will probably take a week or so in total.
[18:19] *** Pixi has joined #archiveteam-bs
[18:20] *** SynMonger has joined #archiveteam-bs
[18:29] *** JAA sets mode: +o SketchCow
[18:45] *** K4k has joined #archiveteam-bs
[18:58] *** powerKitt has quit IRC (Quit: powerKitt)
[19:11] *** powerKitt has joined #archiveteam-bs
[19:24] *** fsr has joined #archiveteam-bs
[19:27] *** odemg has quit IRC (Read error: Connection reset by peer)
[19:27] *** h3x has quit IRC (Read error: Connection reset by peer)
[19:27] <fsr> Is there currently any work done on archiving the Wii Shop Channel? There is an article in the wiki, anything else?
[19:28] <powerKitt> Yeah, guy called Larsenv has been working to archive stuff
[19:29] <powerKitt> He hasn't been able to get the secret word though, so he doesn't have an AT wiki account.
[19:29] <fsr> Ah, yes, I read his post on RiiConnect. I was going to ask if archiveteam and the RiiConnect people are working together. Seems like a yes then. :)
[19:29] <powerKitt> If anyone knows what the current one is, PM me it and I'll pass it on to him.
[19:29] <fsr> the "secret word"?
[19:30] <powerKitt> yeah, you need a passphrase from the IRC to sign up on archiveteam.org
[19:30] <fsr> sign up for the wiki?
[19:30] <powerKitt> yeah
[19:31] <fsr> I see, thanks. What would be the best way to get in touch with larsenv? I'd be interested to know what he plans to do and what he has already done.
[19:32] <powerKitt> do you have a discord account?
[19:32] <fsr> nope
[19:34] <fsr> I have one now.
[19:40] *** fsr has quit IRC ()
[19:47] *** odemg has joined #archiveteam-bs
[19:55] <riking> How do I get wpull to rewrite http://tinypic.com/images/404.gif to a 404 response?
[19:56] <riking> currently have a workaround in my later-running 404 detection/download-rescued-from-wayback
[20:07] *** odemg has quit IRC (Ping timeout: 255 seconds)
[20:10] <bithippo> @riking: You're probably better off using grab-site to perform that transform while archving
[20:10] <bithippo> https://github.com/ludios/grab-site
[20:11] <bithippo> --custom-hooks=PY_SCRIPT: Copy PY_SCRIPT to DIR/custom_hooks.py, then exec DIR/custom_hooks.py on startup and every time it changes. The script gets a wpull_hook global that can be used to change crawl behavior. See update_custom_hooks in libgrabsite/wpull_hooks.py and custom_hooks_sample.py.
[20:13] <riking> ... okay and now I'm dealing with a wpull crash.
[20:13] <riking> https://paste.ubuntu.com/p/wVdhRH7CMf/
[20:14] <riking> it really doesn't like that blob:. The browser doesn't either but
[20:23] *** bsmith094 has quit IRC (Ping timeout: 252 seconds)
[20:23] *** odemg has joined #archiveteam-bs
[21:07] *** RichardG has quit IRC (Read error: Connection reset by peer)
[21:09] *** RichardG has joined #archiveteam-bs
[21:31] *** bsmith093 has joined #archiveteam-bs
[21:37] *** ranav has joined #archiveteam-bs
[21:42] <riking> bithippo: hm, well i'm already doing a ton of post-processing so I guess i'll just stick with what I have.
[21:42] *** ranavalon has quit IRC (Ping timeout: 633 seconds)
[21:43] *** BlueMax has joined #archiveteam-bs
[21:43] <bithippo> Sorry I couldn't be more hlep.
[21:43] <riking> I wonder if I should be segregating the non-pristine fetches into a separate WARC so that the original can be used? (e.g. I download an image from Wayback, insert into .warc with stomped WARC-Target-URI: header -> put that in a separate warc file)
[21:44] <bithippo> That's a question for someone more familiar with our the Wayback Machine and CDX server works.
[21:44] <riking> right now I'm uploading the WARC files to archive items, not to Wayback proper.
[21:44] <riking> but if someone tries to throw my warc files into wayback, those records are almost certainly going to mess it up
[21:45] <riking> eh sure might as well.
[21:49] <riking> up to 2700 lines, 67Kbytes of script. heh.
[21:51] <bithippo> When in doubt, WARCs should not have their content or headers adulterated.
[21:53] <riking> Here's a screenshot of what it's doing right now https://usercontent.irccloud-cdn.com/file/ZPQcNv29/image.png
[21:56] <riking> followed by the image as recieved from that request
[21:58] *** BlueMax has quit IRC (Leaving)
[22:00] *** jschwart has quit IRC (Quit: Konversation terminated!)
[22:08] <bithippo> @riking: What is the eventual target? Wayback? Or something else?
[22:08] <riking> Target is this stuff https://ia801505.us.archive.org/12/items/MSPFA_204/view.html?s=204&p=1
[22:09] <riking> if you open devtools you can see it using Range: requests to pick retrieved files out of the warc
[22:11] <bithippo> Archiving individual MS Paint Art stories as individual items?
[22:11] <riking> so you can see why having it all in one file is nice, but.
[22:11] <riking> Yeah, with all necessary subresources included.
[22:12] <bithippo> So, naively, it seems like the archiving operation isn't the issue, but that you need a more robust player to interpret the archived content for "playback"
[22:13] <riking> That sounds right
[22:13] <bithippo> I apologize if I'm being obtuse, I assure you I'm attempting to be helpful :D
[22:13] <riking> I can update the script to have varying warc filenames. Just wanted to make sure it was worth doing so first
[22:15] <bithippo> Want the whole site ripped?
[22:15] <riking> A previous version saved all the images as individual files. Then I noticed both the text on the help page and the reason for, "please do not have more than 1000 files per item"
[22:15] <riking> the devs are "currently" working on a version that doesn't use a js app for page advances
[22:15] <riking> but yes that's the idea
[22:16] <riking> Right now I'm slowly marching up through the story IDs, finding things that trip up my script - either the archiver or the player
[22:17] <bithippo> Total stories you need to walk?
[22:18] <riking> most recent created ID is 24778
[22:19] <riking> minus deleted entries
[22:21] *** BlueMax has joined #archiveteam-bs
[22:32] *** alex___ has joined #archiveteam-bs
[22:34] <bithippo> @riking: You might consider setting up an ArchiveTeam project for this
[22:34] <riking> oh right
[22:36] *** schbirid has quit IRC (Quit: Leaving)
[22:43] <powerKitt> riking: are you doing mspfa?
[22:43] <riking> Yeah
[22:43] <powerKitt> Nice.
[22:44] <riking> when the story uses photobucket, the archived items are now the best place to read it
[22:45] <powerKitt> Yeah, Photo
[22:45] <powerKitt> *Photobucket is fucking terrible
[23:10] *** Mayonaise has quit IRC (Read error: Connection reset by peer)
[23:32] <bithippo> @powerkitt: https://github.com/bibanon/PB_Spade for those PB shenanigans