[00:05] astrid: "Lengthy Archive Team and >>>archive discussions<<< here" [00:06] *** astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | SketchCow: your porn tapes are getting digitized right [00:07] there [00:07] bleh [00:07] why [00:07] because ola_norsk wouldn't shut up [00:08] why not make a #archiveteam-superimportantstuff and turn #archiveteam into the actual archiveteam discussion channel then [00:08] i didn't make the decision, SketchCow did [00:08] take it up with him [00:08] and keep -bs as the random stuff channel it was [00:08] i dont know you anyways [00:09] what are you talking about i've been here for _years_ [00:09] only noticed you in action with moderation actions [00:09] i also use the nick 'xmc' [00:10] i host the tracker for warrior projects [00:10] oh [00:11] sorry :D [00:11] that nick obviously rings many bells [00:11] heh [00:12] * schbirid hugs and goes to bed [00:13] astrid: The quote got cut off at the end ("... now"). :-( [00:13] ugh [00:13] *** astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | SketchCow: your porn tapes are getting digitized rn [00:14] Yeah, EFNet's limits are ridiculous. [00:14] also schbirid, n.b. the topic in #archiveteam: "Lengthy discussions in #archiveteam-bs | Offtopic in #archiveteam-ot" [00:16] *** ola_norsk has joined #archiveteam-bs [00:20] uhm, the IA Arcade mame can run c64? or? [00:22] nvm, i misread an items desctiption [00:30] "dearest IA, please implement Emscpiten VICE some day" :) https://github.com/rjanicek/vice.js/ [00:31] *** ola_norsk has quit IRC (cya in '18) [00:33] *** schbirid has quit IRC (Ping timeout: 255 seconds) [00:33] *** schbirid has joined #archiveteam-bs [00:45] *** schbirid has quit IRC (Ping timeout: 255 seconds) [00:56] *** schbirid has joined #archiveteam-bs [01:16] Can't ArchiveBot be used as a tweet quoting bot? [01:54] I _finally_ managed to get my parsed Miiverse database onto Azure, so now I can start testing my site for real. [01:55] It's at https://archiverse.guide . It has basic auth on it right now while I stress test the database to make sure it can last under load (and I need to work on a FAQ and something from the home page) [01:56] If anyone wants to take a look and give me feedback, Username: archiverse Password: miiworse [02:10] *** schbirid2 has joined #archiveteam-bs [02:15] *** schbirid has quit IRC (Read error: Operation timed out) [02:20] If Archiverse goes down, and we need to archive it, what do we call the next one? ArchivihcrA? [02:25] Nah, you see, I thought of that [02:26] I'm open sourcing the site and the database [02:26] The database is being uploaded right now to IA. 25 gigs compressed. [02:26] *** kristian_ has quit IRC (Quit: Leaving) [02:27] now it's an even biggest question [02:27] If the IA goes down, is its archive called the Internet Archive Archive? [02:27] Or just the Internet Archive because the Internet Archive is a part of the internet? [02:37] jacketcha: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK [02:42] *** RETNUHWEJ has joined #archiveteam-bs [02:43] INTERNETARCHIVE.BAK Archive.BAK [02:44] oh, that actually looks interesting [02:45] i have a couple of drives lying around I could contribute [02:52] *** balrog has joined #archiveteam-bs [02:52] *** swebb sets mode: +o balrog [02:53] *** jacketcha has quit IRC (Read error: Connection reset by peer) [02:55] *** jacketcha has joined #archiveteam-bs [03:04] hey [03:04] someone brought something to my attention [03:04] there is basically zero archives of any 4chan post on /v/ around july 2015 [03:04] does anybody know anything about that? [03:11] jacketcha: are there 4chan archives aside from that? it crossed my mind recently [03:11] nope [03:12] I just looked through 3 pages of google, and about 11 different archives [03:12] 303712123 can't be found [03:12] (the post im looking for) [03:15] yes, the archives cut out suddenly mid-june and then restart in 2015/10/24 and none of the archives (absolutely none of them) have anything in between from that time period [03:25] OK, the date the missing archives start is from 2015/06/11 and then start again in 2015/10/24 (both dates yyyy/mm/dd) [03:26] I am looking for some posts in between that and would appreciate very much any aid I can get [03:40] *** pizzaiolo has quit IRC (Remote host closed the connection) [03:58] *** jacketcha has quit IRC (Read error: Connection reset by peer) [03:58] *** jacketcha has joined #archiveteam-bs [03:59] *** jacketcha has quit IRC (Read error: Connection reset by peer) [03:59] *** jacketcha has joined #archiveteam-bs [04:33] *** MrDignity has quit IRC (Ping timeout: 248 seconds) [04:33] *** Rai-chan has quit IRC (Ping timeout: 248 seconds) [04:33] *** medowar has quit IRC (Ping timeout: 248 seconds) [04:34] *** ZexaronS has quit IRC (Ping timeout: 248 seconds) [04:34] *** HCross2 has quit IRC (Ping timeout: 248 seconds) [04:34] *** purplebot has quit IRC (Ping timeout: 248 seconds) [04:35] *** ZexaronS has joined #archiveteam-bs [04:36] *** MrDignity has joined #archiveteam-bs [04:36] *** i0npulse has quit IRC (Ping timeout: 248 seconds) [04:40] *** i0npulse has joined #archiveteam-bs [04:46] *** qw3rty115 has joined #archiveteam-bs [04:46] *** medowar has joined #archiveteam-bs [04:46] *** purplebot has joined #archiveteam-bs [04:47] *** HCross2 has joined #archiveteam-bs [04:51] *** Rai-chan has joined #archiveteam-bs [04:52] *** qw3rty114 has quit IRC (Read error: Operation timed out) [04:53] *** jacketcha has quit IRC (Read error: Connection reset by peer) [04:53] *** jacketcha has joined #archiveteam-bs [05:00] *** jacketcha has quit IRC (Read error: Connection reset by peer) [05:00] *** jacketcha has joined #archiveteam-bs [05:10] Does anybody use WARCreate? [05:12] It looks like it could be really useful, but it seems like it hasn't been updated in a while [05:14] And it breaks roughly 60% of the time [06:49] jacketcha wbradley: bibanon might know more about 4chan archives [08:44] *** jacketcha has quit IRC (Leaving) [08:44] *** jacketcha has joined #archiveteam-bs [09:23] *** jschwart has joined #archiveteam-bs [09:50] Alright, so I have a proposal for a secondary ArchiveBot which should be way easier to set up. Here's the basic idea: The pipelines are replaced by users with a chrome extension, and instead of WARCS it uses liveweb. For those of you that don't know how liveweb works, what it does is make a HTTP request to a URL, and then replaces the URLs inside of the response with Wayback URLs. This is how it will work: A [09:50] user requests a website to be archived through the IRC. Then, a control node looks for chrome extension installations, which register themselves when they are installed, and the control node chooses the one with the least load. The chrome extension takes the URL and crawls it for any URLs it can find, rather they be files or pages, but they don't archive them. After that, when there is a completed list of [09:50] URLs, the extension fires off HTTP requests to the liveweb system, or to the URL https://web.archive.org/save/[URL you want to archive]. That's all. There are two main advantages in this system over the currently existing one: accessibility and the lack of a need for client-side storage. Due to the way liveweb works, all the archiving happens over at the Internet Archive, not on the pipeline, so the largest [09:50] thing the pipeline will need to store is a list of URLs. Also, for the accessibility part, since this will be in chrome extension format, it should be cross platform, and extremely easy to install. On top of that, if required, since the chrome extension will be programmed in JavaScript, Android phones will also be compatible with the application. Any feedback? [10:01] *** BlueMaxim has quit IRC (Leaving) [10:58] *** pizzaiolo has joined #archiveteam-bs [11:44] *** drumstick has quit IRC (Read error: Operation timed out) [12:11] how do you make 100% sure that the user's personal data does not end up in a grab? [12:16] schbirid2: liveweb takes care of that already. [12:18] nice [12:19] try it on a page you are logged into [12:20] and note that it will grab the logged out version [12:21] !a http://blog.whyanimalsdothething.com [12:21] whoops, wrong channel [12:23] jacketcha: I like your idea -- please implement it! [12:26] Alright, didn't hear a no [12:27] JSBot is on the way [12:32] jacketcha: yay! [12:33] *** RETNUHWEJ has quit IRC (Ping timeout: 263 seconds) [12:35] I think I'm going to set up the control node in nodejs and then give it a better API than just IRC [12:41] Originally, I was going to add WARCs to it, but then I tried using WARCreate and realized that the relationship between JavaScript and WARCs is one way [13:01] Sounds like a decent idea. The main downside I can think of is that the archives will not be downloadable (liveweb WARCs are private). liveweb is also quite inefficient in my experience compared to a browser + warcprox setup. But the cross-platform and distributed aspects sound nice. [13:05] You know, what if ArchiveTeam hosted a liveweb copy? I am going to guess crawling is half the load [13:06] Because I do understand that liveweb can be glitchy [13:06] or at least the hosted one [13:06] especially under high loads [13:07] take for example all the times my twitter has been captured https://web.archive.org/web/*/https://twitter.com/_jacketchan_ [13:07] there is an obvious variation in quality between captures [13:07] some are perfectly fine [13:07] some are brokenish [13:07] and some are literally just white screens [13:11] *** kimmer1 has joined #archiveteam-bs [13:13] *** pikhq has quit IRC (Ping timeout: 245 seconds) [13:15] wait [13:16] JAA: Couldn't you just WARC the wayback copy and switch the links out? [13:17] *** Mateon1 has quit IRC (Read error: Operation timed out) [13:17] *** Mateon1 has joined #archiveteam-bs [13:26] jacketcha: You can get close, but you won't be able to reconstruct the exact original data sent by the server. [13:26] ... which is what WARC's all about. [13:26] So it's kind of pointless to try that. [13:26] hmm [13:27] can you upload to the archive via post? [13:30] ? [13:30] is there an api besides archive-it [13:30] You can upload WARCs to IA, and they get included in the WM (after an IA admin verifies them). [13:31] That's what ArchiveBot does. [13:31] Oh wait [13:31] ArchiveBot puts the warcs into the fortress of solitude, and then the FOS puts it in the IA, right? [13:32] Yeah [13:32] and I am going to guess that there is a standing API of sorts for the FOS [13:32] great [13:32] Uploads to FOS are just rsync. [13:32] A few pipelines upload their data directly to IA. [13:32] wait, so the IA does have an API for uploads? [13:33] Of course it does. [13:33] thank god [13:33] Look at the internetarchive Python package. [13:33] oh yeah [13:33] I keep forgetting github and open source projects are a thing [13:33] It has a CLI tool "ia" and can be used from within Python. [13:36] And the S3-like interface can obviously also be implemented in anything else. [13:36] https://archive.org/help/abouts3.txt [13:36] great [13:36] all I need is the end points [13:38] oh wow it's 3:38 in the morning [13:38] Should probably get to sleep before four [13:40] I'll check this in the morning [13:41] gn/gm\ [13:41] *gn/gm [13:59] *** jschwart has quit IRC (Konversation terminated!) [14:07] *** jschwart has joined #archiveteam-bs [14:37] *** jacketcha has quit IRC (Read error: Connection reset by peer) [14:37] *** jacketcha has joined #archiveteam-bs [14:41] *** kimmer12 has joined #archiveteam-bs [14:45] *** kimmer13 has joined #archiveteam-bs [14:48] *** kimmer1 has quit IRC (Read error: Operation timed out) [14:51] *** kimmer1 has joined #archiveteam-bs [14:52] *** kimmer12 has quit IRC (Ping timeout: 633 seconds) [14:55] *** kimmer12 has joined #archiveteam-bs [14:55] *** kimmer13 has quit IRC (Ping timeout: 633 seconds) [15:02] *** kimmer1 has quit IRC (Ping timeout: 633 seconds) [15:24] *** Ceryn has quit IRC (Read error: Operation timed out) [15:25] *** Ceryn has joined #archiveteam-bs [15:32] *** Gfy has quit IRC (Quit: I'll be back!) [15:33] *** Gfy has joined #archiveteam-bs [15:59] *** kimmer1 has joined #archiveteam-bs [16:07] *** kimmer12 has quit IRC (Ping timeout: 633 seconds) [16:08] *** kimmer13 has joined #archiveteam-bs [16:08] *** kimmer1 has quit IRC (Read error: Operation timed out) [16:12] *** kimmer1 has joined #archiveteam-bs [16:15] *** kimmer13 has quit IRC (Read error: Operation timed out) [16:20] *** LastNinja has quit IRC (Read error: Connection reset by peer) [16:36] *** dashcloud has quit IRC (Read error: Connection reset by peer) [16:37] *** dashcloud has joined #archiveteam-bs [17:53] *** pikhq has joined #archiveteam-bs [18:11] i'm up to 2017-10-31 for kpfa stuff [18:15] *** icedice has joined #archiveteam-bs [18:21] *** kimmer1 has quit IRC (Read error: Connection reset by peer) [18:21] *** kimmer12 has joined #archiveteam-bs [18:28] *** svchost03 has quit IRC (Ping timeout: 360 seconds) [19:07] *** C4K3_ has joined #archiveteam-bs [19:09] *** C4K3 has quit IRC (Read error: Operation timed out) [19:13] *** icedice has quit IRC (Read error: Connection reset by peer) [19:27] so my new tapes i bought came [19:27] alot of pbs and tlc stuff on these tapes [19:57] *** C4K3_ is now known as C4K3 [20:32] *** svchost03 has joined #archiveteam-bs [20:32] *** svchfoo1 sets mode: +o svchost03 [20:55] jrwr: svchost02 seems to be broken, doesn't respond to invites. [21:07] Could someone please tell me the deal with vidme [21:07] 21G vidme4 [21:07] 355G vidme5 [21:07] I have these two things clogging up FOS, I'd like to know if I add them or not. [21:16] arkiver ^ I know vidme5 is definitely *good* data, not sure about vidme4 [21:54] *** MrDignity has quit IRC (Remote host closed the connection) [21:54] *** MrDignity has joined #archiveteam-bs [21:59] Well, I'd like to know, I'm trying to push all the data off of FOS so it's not riding at 50% capacity [22:00] Also, I'm down to the last half-terabyte of Manga so that's good [22:00] P.S. I am sick of fuckin' manga [22:30] *** dashcloud has quit IRC (Ping timeout: 250 seconds) [22:32] *** dashcloud has joined #archiveteam-bs [22:34] *** MrDignity has quit IRC (Ping timeout: 490 seconds) [22:48] Did someone ask for more manga? [22:50] it sounds like we're good for the moment [22:57] We're more than good [23:00] Especially after this last .5 terabyte [23:01] I am watching the entire run of The Prisoner and I'm shocked at how few people know aboput The Prisoner [23:07] Also, in other -bs news, one of my most popular blog posts ever, randomly shot up in the charts [23:07] My blog has been getting 5 reads an hour average, because I've been elsewhere [23:07] And someone on reddit linked to an entry [23:07] 3,300 reads in one hour [23:07] Reddit, man [23:11] Wow [23:12] Filesystem Size Used Avail Use% Mounted on [23:12] dev/md1 13T 7.3T 5.3T 59% /2 [23:12] dev/md0 3.6T 2.4T 1.3T 66% /1 [23:12] One day [23:20] godane: There are 11 shows of WoW Insider in the godaneinbox that lack any mp3s. [23:21] *** MrDignity has joined #archiveteam-bs [23:25] *** MrDignity has quit IRC (Remote host closed the connection) [23:25] *** MrDignity has joined #archiveteam-bs [23:25] *** BlueMaxim has joined #archiveteam-bs [23:29] *** drumstick has joined #archiveteam-bs [23:34] SketchCow: example url please? [23:37] Joystiq_WoW_Insider_Show_2165 [23:37] Joystiq_WoW_Insider_Show_2175 [23:37] Joystiq_WoW_Insider_Show_1925 [23:37] Joystiq_WoW_Insider_Show_1815 [23:37] Joystiq_WoW_Insider_Show_1805 [23:37] Joystiq_WoW_Insider_Show_1795 [23:37] Joystiq_WoW_Insider_Show_1785 [23:37] Joystiq_WoW_Insider_Show_1775 [23:37] Joystiq_WoW_Insider_Show_1765 [23:37] Joystiq_WoW_Insider_Show_1755 [23:37] Joystiq_WoW_Insider_Show_2095 [23:39] that maybe cause of the brute force [23:40] so nothing is missing [23:40] i hope [23:40] i can't check right now since i'm digitizing a tape anyways [23:42] That's fine [23:42] Just wanted you aware [23:56] astrid: I was told to keep yahooanswers texts for you [23:56] hm, i'm not sure why but i guess? [23:56] i'll happily take them i guess [23:56] how big, what format, etc [23:57] I don't know why I was supposed to [23:58] They were a glorious pain in the ass to deal with [23:58] well then in that case i absolutely must have them [23:59] I'm finishing up the upload. [23:59] it will be https://archive.org/details/archiveteam_yahooanswers_gathering [23:59] yay [23:59] I uploaded some directory by mistake, have to clean [23:59] we all make mistakes