#archiveteam-bs 2018-03-05,Mon

↑back Search

Time Nickname Message
01:06 🔗 Stiletto has joined #archiveteam-bs
01:08 🔗 Stilett0 has quit IRC (Read error: Operation timed out)
01:13 🔗 Stilett0 has joined #archiveteam-bs
01:17 🔗 Stiletto has quit IRC (Read error: Operation timed out)
01:33 🔗 ndiddy has quit IRC (Read error: Operation timed out)
01:41 🔗 ndiddy has joined #archiveteam-bs
01:54 🔗 robogoat bithippo: Might also look at https://hackage.haskell.org/package/github-backup
01:54 🔗 bithippo :thumbs up:
02:09 🔗 JAA_ has joined #archiveteam-bs
02:16 🔗 JAA_ has quit IRC (leaving)
02:19 🔗 JAA_ has joined #archiveteam-bs
02:19 🔗 JAA sets mode: +o JAA_
02:22 🔗 JAA has quit IRC (leaving)
02:22 🔗 JAA_ is now known as JAA
02:36 🔗 ndiddy has quit IRC (Ping timeout: 492 seconds)
03:10 🔗 jacketcha has joined #archiveteam-bs
03:31 🔗 JAA https://twitter.com/textfiles/status/970448544271233024 lol, looks like saying "guys, don't archive this!" is a good way of finding new archivists for that particular content niche. We should keep that in mind. ;-)
03:48 🔗 Despatche you wouldn't download a website
04:12 🔗 qw3rty113 has joined #archiveteam-bs
04:14 🔗 superkuh It sure is great how twitter auto-detects when JS is disabled and redirects you to a version of their site that hides all the post content. Super usable.
04:18 🔗 qw3rty112 has quit IRC (Read error: Operation timed out)
05:05 🔗 dashcloud has quit IRC (Read error: Operation timed out)
05:08 🔗 dashcloud has joined #archiveteam-bs
06:10 🔗 odemg has quit IRC (Read error: Operation timed out)
06:21 🔗 odemg has joined #archiveteam-bs
06:35 🔗 Pixi` has quit IRC (Quit: Pixi`)
06:35 🔗 Pixi has joined #archiveteam-bs
06:59 🔗 wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES)
07:06 🔗 odemg has quit IRC (Read error: Connection reset by peer)
07:08 🔗 dashcloud has quit IRC (Read error: Operation timed out)
07:22 🔗 odemg has joined #archiveteam-bs
07:23 🔗 h3x has quit IRC (Read error: Operation timed out)
07:24 🔗 h3x has joined #archiveteam-bs
07:37 🔗 schbirid has joined #archiveteam-bs
07:47 🔗 odemg has quit IRC (Ping timeout: 252 seconds)
08:00 🔗 odemg has joined #archiveteam-bs
10:21 🔗 BlueMax has quit IRC (Read error: Connection reset by peer)
10:33 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
10:33 🔗 Mateon1 has joined #archiveteam-bs
11:06 🔗 odemg has quit IRC (Read error: Operation timed out)
11:14 🔗 wp494 has joined #archiveteam-bs
11:20 🔗 odemg has joined #archiveteam-bs
11:55 🔗 odemg has quit IRC (Read error: Connection reset by peer)
12:15 🔗 odemg has joined #archiveteam-bs
13:21 🔗 odemg has quit IRC (Read error: Connection reset by peer)
13:36 🔗 odemg has joined #archiveteam-bs
15:25 🔗 odemg has quit IRC (Read error: Connection reset by peer)
15:52 🔗 odemg has joined #archiveteam-bs
15:52 🔗 odemg has quit IRC (Connection closed)
15:53 🔗 Dimtree has quit IRC (Peace)
15:56 🔗 odemg has joined #archiveteam-bs
15:56 🔗 odemg has quit IRC (Connection closed)
15:56 🔗 Dimtree has joined #archiveteam-bs
15:59 🔗 odemg has joined #archiveteam-bs
16:07 🔗 odemg has quit IRC (Read error: Connection reset by peer)
16:18 🔗 odemg has joined #archiveteam-bs
17:30 🔗 ld1 has quit IRC (Read error: Connection reset by peer)
17:31 🔗 ld1 has joined #archiveteam-bs
17:46 🔗 Dimtree has quit IRC (Peace)
17:51 🔗 Dimtree has joined #archiveteam-bs
17:59 🔗 Pixi has quit IRC (Quit: Pixi)
18:03 🔗 jschwart has joined #archiveteam-bs
18:09 🔗 SketchCow JAA: Running
18:09 🔗 JAA Thanks
18:10 🔗 SketchCow 599c8fa2311eff5cfc358407fd262642e3db4034af6e10241e5af28a392e536f charlierose.com-videos-00000.warc.gz
18:11 🔗 JAA Sweet, thank you. :-)
18:14 🔗 SketchCow Do I delete this now
18:17 🔗 JAA No
18:17 🔗 JAA But it means that I can delete WARC 00000 on my machine and continue grabbing.
18:18 🔗 JAA And upload WARCs 1 through 43 and then delete those as well, etc.
18:18 🔗 JAA I'll let you know when it's done and ready for transfer to IA.
18:18 🔗 JAA Will probably take a week or so in total.
18:19 🔗 Pixi has joined #archiveteam-bs
18:20 🔗 SynMonger has joined #archiveteam-bs
18:29 🔗 JAA sets mode: +o SketchCow
18:45 🔗 K4k has joined #archiveteam-bs
18:58 🔗 powerKitt has quit IRC (Quit: powerKitt)
19:11 🔗 powerKitt has joined #archiveteam-bs
19:24 🔗 fsr has joined #archiveteam-bs
19:27 🔗 odemg has quit IRC (Read error: Connection reset by peer)
19:27 🔗 h3x has quit IRC (Read error: Connection reset by peer)
19:27 🔗 fsr Is there currently any work done on archiving the Wii Shop Channel? There is an article in the wiki, anything else?
19:28 🔗 powerKitt Yeah, guy called Larsenv has been working to archive stuff
19:29 🔗 powerKitt He hasn't been able to get the secret word though, so he doesn't have an AT wiki account.
19:29 🔗 fsr Ah, yes, I read his post on RiiConnect. I was going to ask if archiveteam and the RiiConnect people are working together. Seems like a yes then. :)
19:29 🔗 powerKitt If anyone knows what the current one is, PM me it and I'll pass it on to him.
19:29 🔗 fsr the "secret word"?
19:30 🔗 powerKitt yeah, you need a passphrase from the IRC to sign up on archiveteam.org
19:30 🔗 fsr sign up for the wiki?
19:30 🔗 powerKitt yeah
19:31 🔗 fsr I see, thanks. What would be the best way to get in touch with larsenv? I'd be interested to know what he plans to do and what he has already done.
19:32 🔗 powerKitt do you have a discord account?
19:32 🔗 fsr nope
19:34 🔗 fsr I have one now.
19:40 🔗 fsr has quit IRC ()
19:47 🔗 odemg has joined #archiveteam-bs
19:55 🔗 riking How do I get wpull to rewrite http://tinypic.com/images/404.gif to a 404 response?
19:56 🔗 riking currently have a workaround in my later-running 404 detection/download-rescued-from-wayback
20:07 🔗 odemg has quit IRC (Ping timeout: 255 seconds)
20:10 🔗 bithippo @riking: You're probably better off using grab-site to perform that transform while archving
20:10 🔗 bithippo https://github.com/ludios/grab-site
20:11 🔗 bithippo --custom-hooks=PY_SCRIPT: Copy PY_SCRIPT to DIR/custom_hooks.py, then exec DIR/custom_hooks.py on startup and every time it changes. The script gets a wpull_hook global that can be used to change crawl behavior. See update_custom_hooks in libgrabsite/wpull_hooks.py and custom_hooks_sample.py.
20:13 🔗 riking ... okay and now I'm dealing with a wpull crash.
20:13 🔗 riking https://paste.ubuntu.com/p/wVdhRH7CMf/
20:14 🔗 riking it really doesn't like that blob:. The browser doesn't either but
20:23 🔗 bsmith094 has quit IRC (Ping timeout: 252 seconds)
20:23 🔗 odemg has joined #archiveteam-bs
21:07 🔗 RichardG has quit IRC (Read error: Connection reset by peer)
21:09 🔗 RichardG has joined #archiveteam-bs
21:31 🔗 bsmith093 has joined #archiveteam-bs
21:37 🔗 ranav has joined #archiveteam-bs
21:42 🔗 riking bithippo: hm, well i'm already doing a ton of post-processing so I guess i'll just stick with what I have.
21:42 🔗 ranavalon has quit IRC (Ping timeout: 633 seconds)
21:43 🔗 BlueMax has joined #archiveteam-bs
21:43 🔗 bithippo Sorry I couldn't be more hlep.
21:43 🔗 riking I wonder if I should be segregating the non-pristine fetches into a separate WARC so that the original can be used? (e.g. I download an image from Wayback, insert into .warc with stomped WARC-Target-URI: header -> put that in a separate warc file)
21:44 🔗 bithippo That's a question for someone more familiar with our the Wayback Machine and CDX server works.
21:44 🔗 riking right now I'm uploading the WARC files to archive items, not to Wayback proper.
21:44 🔗 riking but if someone tries to throw my warc files into wayback, those records are almost certainly going to mess it up
21:45 🔗 riking eh sure might as well.
21:49 🔗 riking up to 2700 lines, 67Kbytes of script. heh.
21:51 🔗 bithippo When in doubt, WARCs should not have their content or headers adulterated.
21:53 🔗 riking Here's a screenshot of what it's doing right now https://usercontent.irccloud-cdn.com/file/ZPQcNv29/image.png
21:56 🔗 riking followed by the image as recieved from that request
21:58 🔗 BlueMax has quit IRC (Leaving)
22:00 🔗 jschwart has quit IRC (Quit: Konversation terminated!)
22:08 🔗 bithippo @riking: What is the eventual target? Wayback? Or something else?
22:08 🔗 riking Target is this stuff https://ia801505.us.archive.org/12/items/MSPFA_204/view.html?s=204&p=1
22:09 🔗 riking if you open devtools you can see it using Range: requests to pick retrieved files out of the warc
22:11 🔗 bithippo Archiving individual MS Paint Art stories as individual items?
22:11 🔗 riking so you can see why having it all in one file is nice, but.
22:11 🔗 riking Yeah, with all necessary subresources included.
22:12 🔗 bithippo So, naively, it seems like the archiving operation isn't the issue, but that you need a more robust player to interpret the archived content for "playback"
22:13 🔗 riking That sounds right
22:13 🔗 bithippo I apologize if I'm being obtuse, I assure you I'm attempting to be helpful :D
22:13 🔗 riking I can update the script to have varying warc filenames. Just wanted to make sure it was worth doing so first
22:15 🔗 bithippo Want the whole site ripped?
22:15 🔗 riking A previous version saved all the images as individual files. Then I noticed both the text on the help page and the reason for, "please do not have more than 1000 files per item"
22:15 🔗 riking the devs are "currently" working on a version that doesn't use a js app for page advances
22:15 🔗 riking but yes that's the idea
22:16 🔗 riking Right now I'm slowly marching up through the story IDs, finding things that trip up my script - either the archiver or the player
22:17 🔗 bithippo Total stories you need to walk?
22:18 🔗 riking most recent created ID is 24778
22:19 🔗 riking minus deleted entries
22:21 🔗 BlueMax has joined #archiveteam-bs
22:32 🔗 alex___ has joined #archiveteam-bs
22:34 🔗 bithippo @riking: You might consider setting up an ArchiveTeam project for this
22:34 🔗 riking oh right
22:36 🔗 schbirid has quit IRC (Quit: Leaving)
22:43 🔗 powerKitt riking: are you doing mspfa?
22:43 🔗 riking Yeah
22:43 🔗 powerKitt Nice.
22:44 🔗 riking when the story uses photobucket, the archived items are now the best place to read it
22:45 🔗 powerKitt Yeah, Photo
22:45 🔗 powerKitt *Photobucket is fucking terrible
23:10 🔗 Mayonaise has quit IRC (Read error: Connection reset by peer)
23:32 🔗 bithippo @powerkitt: https://github.com/bibanon/PB_Spade for those PB shenanigans

irclogger-viewer