Time |
Nickname |
Message |
01:06
🔗
|
|
Stiletto has joined #archiveteam-bs |
01:08
🔗
|
|
Stilett0 has quit IRC (Read error: Operation timed out) |
01:13
🔗
|
|
Stilett0 has joined #archiveteam-bs |
01:17
🔗
|
|
Stiletto has quit IRC (Read error: Operation timed out) |
01:33
🔗
|
|
ndiddy has quit IRC (Read error: Operation timed out) |
01:41
🔗
|
|
ndiddy has joined #archiveteam-bs |
01:54
🔗
|
robogoat |
bithippo: Might also look at https://hackage.haskell.org/package/github-backup |
01:54
🔗
|
bithippo |
:thumbs up: |
02:09
🔗
|
|
JAA_ has joined #archiveteam-bs |
02:16
🔗
|
|
JAA_ has quit IRC (leaving) |
02:19
🔗
|
|
JAA_ has joined #archiveteam-bs |
02:19
🔗
|
|
JAA sets mode: +o JAA_ |
02:22
🔗
|
|
JAA has quit IRC (leaving) |
02:22
🔗
|
|
JAA_ is now known as JAA |
02:36
🔗
|
|
ndiddy has quit IRC (Ping timeout: 492 seconds) |
03:10
🔗
|
|
jacketcha has joined #archiveteam-bs |
03:31
🔗
|
JAA |
https://twitter.com/textfiles/status/970448544271233024 lol, looks like saying "guys, don't archive this!" is a good way of finding new archivists for that particular content niche. We should keep that in mind. ;-) |
03:48
🔗
|
Despatche |
you wouldn't download a website |
04:12
🔗
|
|
qw3rty113 has joined #archiveteam-bs |
04:14
🔗
|
superkuh |
It sure is great how twitter auto-detects when JS is disabled and redirects you to a version of their site that hides all the post content. Super usable. |
04:18
🔗
|
|
qw3rty112 has quit IRC (Read error: Operation timed out) |
05:05
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
05:08
🔗
|
|
dashcloud has joined #archiveteam-bs |
06:10
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
06:21
🔗
|
|
odemg has joined #archiveteam-bs |
06:35
🔗
|
|
Pixi` has quit IRC (Quit: Pixi`) |
06:35
🔗
|
|
Pixi has joined #archiveteam-bs |
06:59
🔗
|
|
wp494 has quit IRC (LOUD UNNECESSARY QUIT MESSAGES) |
07:06
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
07:08
🔗
|
|
dashcloud has quit IRC (Read error: Operation timed out) |
07:22
🔗
|
|
odemg has joined #archiveteam-bs |
07:23
🔗
|
|
h3x has quit IRC (Read error: Operation timed out) |
07:24
🔗
|
|
h3x has joined #archiveteam-bs |
07:37
🔗
|
|
schbirid has joined #archiveteam-bs |
07:47
🔗
|
|
odemg has quit IRC (Ping timeout: 252 seconds) |
08:00
🔗
|
|
odemg has joined #archiveteam-bs |
10:21
🔗
|
|
BlueMax has quit IRC (Read error: Connection reset by peer) |
10:33
🔗
|
|
Mateon1 has quit IRC (Read error: Operation timed out) |
10:33
🔗
|
|
Mateon1 has joined #archiveteam-bs |
11:06
🔗
|
|
odemg has quit IRC (Read error: Operation timed out) |
11:14
🔗
|
|
wp494 has joined #archiveteam-bs |
11:20
🔗
|
|
odemg has joined #archiveteam-bs |
11:55
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
12:15
🔗
|
|
odemg has joined #archiveteam-bs |
13:21
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
13:36
🔗
|
|
odemg has joined #archiveteam-bs |
15:25
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
15:52
🔗
|
|
odemg has joined #archiveteam-bs |
15:52
🔗
|
|
odemg has quit IRC (Connection closed) |
15:53
🔗
|
|
Dimtree has quit IRC (Peace) |
15:56
🔗
|
|
odemg has joined #archiveteam-bs |
15:56
🔗
|
|
odemg has quit IRC (Connection closed) |
15:56
🔗
|
|
Dimtree has joined #archiveteam-bs |
15:59
🔗
|
|
odemg has joined #archiveteam-bs |
16:07
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
16:18
🔗
|
|
odemg has joined #archiveteam-bs |
17:30
🔗
|
|
ld1 has quit IRC (Read error: Connection reset by peer) |
17:31
🔗
|
|
ld1 has joined #archiveteam-bs |
17:46
🔗
|
|
Dimtree has quit IRC (Peace) |
17:51
🔗
|
|
Dimtree has joined #archiveteam-bs |
17:59
🔗
|
|
Pixi has quit IRC (Quit: Pixi) |
18:03
🔗
|
|
jschwart has joined #archiveteam-bs |
18:09
🔗
|
SketchCow |
JAA: Running |
18:09
🔗
|
JAA |
Thanks |
18:10
🔗
|
SketchCow |
599c8fa2311eff5cfc358407fd262642e3db4034af6e10241e5af28a392e536f charlierose.com-videos-00000.warc.gz |
18:11
🔗
|
JAA |
Sweet, thank you. :-) |
18:14
🔗
|
SketchCow |
Do I delete this now |
18:17
🔗
|
JAA |
No |
18:17
🔗
|
JAA |
But it means that I can delete WARC 00000 on my machine and continue grabbing. |
18:18
🔗
|
JAA |
And upload WARCs 1 through 43 and then delete those as well, etc. |
18:18
🔗
|
JAA |
I'll let you know when it's done and ready for transfer to IA. |
18:18
🔗
|
JAA |
Will probably take a week or so in total. |
18:19
🔗
|
|
Pixi has joined #archiveteam-bs |
18:20
🔗
|
|
SynMonger has joined #archiveteam-bs |
18:29
🔗
|
|
JAA sets mode: +o SketchCow |
18:45
🔗
|
|
K4k has joined #archiveteam-bs |
18:58
🔗
|
|
powerKitt has quit IRC (Quit: powerKitt) |
19:11
🔗
|
|
powerKitt has joined #archiveteam-bs |
19:24
🔗
|
|
fsr has joined #archiveteam-bs |
19:27
🔗
|
|
odemg has quit IRC (Read error: Connection reset by peer) |
19:27
🔗
|
|
h3x has quit IRC (Read error: Connection reset by peer) |
19:27
🔗
|
fsr |
Is there currently any work done on archiving the Wii Shop Channel? There is an article in the wiki, anything else? |
19:28
🔗
|
powerKitt |
Yeah, guy called Larsenv has been working to archive stuff |
19:29
🔗
|
powerKitt |
He hasn't been able to get the secret word though, so he doesn't have an AT wiki account. |
19:29
🔗
|
fsr |
Ah, yes, I read his post on RiiConnect. I was going to ask if archiveteam and the RiiConnect people are working together. Seems like a yes then. :) |
19:29
🔗
|
powerKitt |
If anyone knows what the current one is, PM me it and I'll pass it on to him. |
19:29
🔗
|
fsr |
the "secret word"? |
19:30
🔗
|
powerKitt |
yeah, you need a passphrase from the IRC to sign up on archiveteam.org |
19:30
🔗
|
fsr |
sign up for the wiki? |
19:30
🔗
|
powerKitt |
yeah |
19:31
🔗
|
fsr |
I see, thanks. What would be the best way to get in touch with larsenv? I'd be interested to know what he plans to do and what he has already done. |
19:32
🔗
|
powerKitt |
do you have a discord account? |
19:32
🔗
|
fsr |
nope |
19:34
🔗
|
fsr |
I have one now. |
19:40
🔗
|
|
fsr has quit IRC () |
19:47
🔗
|
|
odemg has joined #archiveteam-bs |
19:55
🔗
|
riking |
How do I get wpull to rewrite http://tinypic.com/images/404.gif to a 404 response? |
19:56
🔗
|
riking |
currently have a workaround in my later-running 404 detection/download-rescued-from-wayback |
20:07
🔗
|
|
odemg has quit IRC (Ping timeout: 255 seconds) |
20:10
🔗
|
bithippo |
@riking: You're probably better off using grab-site to perform that transform while archving |
20:10
🔗
|
bithippo |
https://github.com/ludios/grab-site |
20:11
🔗
|
bithippo |
--custom-hooks=PY_SCRIPT: Copy PY_SCRIPT to DIR/custom_hooks.py, then exec DIR/custom_hooks.py on startup and every time it changes. The script gets a wpull_hook global that can be used to change crawl behavior. See update_custom_hooks in libgrabsite/wpull_hooks.py and custom_hooks_sample.py. |
20:13
🔗
|
riking |
... okay and now I'm dealing with a wpull crash. |
20:13
🔗
|
riking |
https://paste.ubuntu.com/p/wVdhRH7CMf/ |
20:14
🔗
|
riking |
it really doesn't like that blob:. The browser doesn't either but |
20:23
🔗
|
|
bsmith094 has quit IRC (Ping timeout: 252 seconds) |
20:23
🔗
|
|
odemg has joined #archiveteam-bs |
21:07
🔗
|
|
RichardG has quit IRC (Read error: Connection reset by peer) |
21:09
🔗
|
|
RichardG has joined #archiveteam-bs |
21:31
🔗
|
|
bsmith093 has joined #archiveteam-bs |
21:37
🔗
|
|
ranav has joined #archiveteam-bs |
21:42
🔗
|
riking |
bithippo: hm, well i'm already doing a ton of post-processing so I guess i'll just stick with what I have. |
21:42
🔗
|
|
ranavalon has quit IRC (Ping timeout: 633 seconds) |
21:43
🔗
|
|
BlueMax has joined #archiveteam-bs |
21:43
🔗
|
bithippo |
Sorry I couldn't be more hlep. |
21:43
🔗
|
riking |
I wonder if I should be segregating the non-pristine fetches into a separate WARC so that the original can be used? (e.g. I download an image from Wayback, insert into .warc with stomped WARC-Target-URI: header -> put that in a separate warc file) |
21:44
🔗
|
bithippo |
That's a question for someone more familiar with our the Wayback Machine and CDX server works. |
21:44
🔗
|
riking |
right now I'm uploading the WARC files to archive items, not to Wayback proper. |
21:44
🔗
|
riking |
but if someone tries to throw my warc files into wayback, those records are almost certainly going to mess it up |
21:45
🔗
|
riking |
eh sure might as well. |
21:49
🔗
|
riking |
up to 2700 lines, 67Kbytes of script. heh. |
21:51
🔗
|
bithippo |
When in doubt, WARCs should not have their content or headers adulterated. |
21:53
🔗
|
riking |
Here's a screenshot of what it's doing right now https://usercontent.irccloud-cdn.com/file/ZPQcNv29/image.png |
21:56
🔗
|
riking |
followed by the image as recieved from that request |
21:58
🔗
|
|
BlueMax has quit IRC (Leaving) |
22:00
🔗
|
|
jschwart has quit IRC (Quit: Konversation terminated!) |
22:08
🔗
|
bithippo |
@riking: What is the eventual target? Wayback? Or something else? |
22:08
🔗
|
riking |
Target is this stuff https://ia801505.us.archive.org/12/items/MSPFA_204/view.html?s=204&p=1 |
22:09
🔗
|
riking |
if you open devtools you can see it using Range: requests to pick retrieved files out of the warc |
22:11
🔗
|
bithippo |
Archiving individual MS Paint Art stories as individual items? |
22:11
🔗
|
riking |
so you can see why having it all in one file is nice, but. |
22:11
🔗
|
riking |
Yeah, with all necessary subresources included. |
22:12
🔗
|
bithippo |
So, naively, it seems like the archiving operation isn't the issue, but that you need a more robust player to interpret the archived content for "playback" |
22:13
🔗
|
riking |
That sounds right |
22:13
🔗
|
bithippo |
I apologize if I'm being obtuse, I assure you I'm attempting to be helpful :D |
22:13
🔗
|
riking |
I can update the script to have varying warc filenames. Just wanted to make sure it was worth doing so first |
22:15
🔗
|
bithippo |
Want the whole site ripped? |
22:15
🔗
|
riking |
A previous version saved all the images as individual files. Then I noticed both the text on the help page and the reason for, "please do not have more than 1000 files per item" |
22:15
🔗
|
riking |
the devs are "currently" working on a version that doesn't use a js app for page advances |
22:15
🔗
|
riking |
but yes that's the idea |
22:16
🔗
|
riking |
Right now I'm slowly marching up through the story IDs, finding things that trip up my script - either the archiver or the player |
22:17
🔗
|
bithippo |
Total stories you need to walk? |
22:18
🔗
|
riking |
most recent created ID is 24778 |
22:19
🔗
|
riking |
minus deleted entries |
22:21
🔗
|
|
BlueMax has joined #archiveteam-bs |
22:32
🔗
|
|
alex___ has joined #archiveteam-bs |
22:34
🔗
|
bithippo |
@riking: You might consider setting up an ArchiveTeam project for this |
22:34
🔗
|
riking |
oh right |
22:36
🔗
|
|
schbirid has quit IRC (Quit: Leaving) |
22:43
🔗
|
powerKitt |
riking: are you doing mspfa? |
22:43
🔗
|
riking |
Yeah |
22:43
🔗
|
powerKitt |
Nice. |
22:44
🔗
|
riking |
when the story uses photobucket, the archived items are now the best place to read it |
22:45
🔗
|
powerKitt |
Yeah, Photo |
22:45
🔗
|
powerKitt |
*Photobucket is fucking terrible |
23:10
🔗
|
|
Mayonaise has quit IRC (Read error: Connection reset by peer) |
23:32
🔗
|
bithippo |
@powerkitt: https://github.com/bibanon/PB_Spade for those PB shenanigans |