#archiveteam-bs 2017-12-30,Sat

↑back Search

Time Nickname Message
00:05 🔗 schbirid astrid: "Lengthy Archive Team and >>>archive discussions<<< here"
00:06 🔗 astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | <godane> SketchCow: your porn tapes are getting digitized right
00:07 🔗 astrid there
00:07 🔗 schbirid bleh
00:07 🔗 schbirid why
00:07 🔗 astrid because ola_norsk wouldn't shut up
00:08 🔗 schbirid why not make a #archiveteam-superimportantstuff and turn #archiveteam into the actual archiveteam discussion channel then
00:08 🔗 astrid i didn't make the decision, SketchCow did
00:08 🔗 astrid take it up with him
00:08 🔗 schbirid and keep -bs as the random stuff channel it was
00:08 🔗 schbirid i dont know you anyways
00:09 🔗 astrid what are you talking about i've been here for _years_
00:09 🔗 schbirid only noticed you in action with moderation actions
00:09 🔗 astrid i also use the nick 'xmc'
00:10 🔗 astrid i host the tracker for warrior projects
00:10 🔗 schbirid oh
00:11 🔗 schbirid sorry :D
00:11 🔗 schbirid that nick obviously rings many bells
00:11 🔗 astrid heh
00:12 🔗 * schbirid hugs and goes to bed
00:13 🔗 JAA astrid: The quote got cut off at the end ("... now"). :-(
00:13 🔗 astrid ugh
00:13 🔗 astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | <godane> SketchCow: your porn tapes are getting digitized rn
00:14 🔗 JAA Yeah, EFNet's limits are ridiculous.
00:14 🔗 astrid also schbirid, n.b. the topic in #archiveteam: "Lengthy discussions in #archiveteam-bs | Offtopic in #archiveteam-ot"
00:16 🔗 ola_norsk has joined #archiveteam-bs
00:20 🔗 ola_norsk uhm, the IA Arcade mame can run c64? or?
00:22 🔗 ola_norsk nvm, i misread an items desctiption
00:30 🔗 ola_norsk "dearest IA, please implement Emscpiten VICE some day" :) https://github.com/rjanicek/vice.js/
00:31 🔗 ola_norsk has quit IRC (cya in '18)
00:33 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
00:33 🔗 schbirid has joined #archiveteam-bs
00:45 🔗 schbirid has quit IRC (Ping timeout: 255 seconds)
00:56 🔗 schbirid has joined #archiveteam-bs
01:16 🔗 jacketcha Can't ArchiveBot be used as a tweet quoting bot?
01:54 🔗 DrasticAc I _finally_ managed to get my parsed Miiverse database onto Azure, so now I can start testing my site for real.
01:55 🔗 DrasticAc It's at https://archiverse.guide . It has basic auth on it right now while I stress test the database to make sure it can last under load (and I need to work on a FAQ and something from the home page)
01:56 🔗 DrasticAc If anyone wants to take a look and give me feedback, Username: archiverse Password: miiworse
02:10 🔗 schbirid2 has joined #archiveteam-bs
02:15 🔗 schbirid has quit IRC (Read error: Operation timed out)
02:20 🔗 jacketcha If Archiverse goes down, and we need to archive it, what do we call the next one? ArchivihcrA?
02:25 🔗 DrasticAc Nah, you see, I thought of that
02:26 🔗 DrasticAc I'm open sourcing the site and the database
02:26 🔗 DrasticAc The database is being uploaded right now to IA. 25 gigs compressed.
02:26 🔗 kristian_ has quit IRC (Quit: Leaving)
02:27 🔗 jacketcha now it's an even biggest question
02:27 🔗 jacketcha If the IA goes down, is its archive called the Internet Archive Archive?
02:27 🔗 jacketcha Or just the Internet Archive because the Internet Archive is a part of the internet?
02:37 🔗 joepie91 jacketcha: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK
02:42 🔗 RETNUHWEJ has joined #archiveteam-bs
02:43 🔗 jacketcha INTERNETARCHIVE.BAK Archive.BAK
02:44 🔗 jacketcha oh, that actually looks interesting
02:45 🔗 jacketcha i have a couple of drives lying around I could contribute
02:52 🔗 balrog has joined #archiveteam-bs
02:52 🔗 swebb sets mode: +o balrog
02:53 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
02:55 🔗 jacketcha has joined #archiveteam-bs
03:04 🔗 jacketcha hey
03:04 🔗 jacketcha someone brought something to my attention
03:04 🔗 jacketcha there is basically zero archives of any 4chan post on /v/ around july 2015
03:04 🔗 jacketcha does anybody know anything about that?
03:11 🔗 wbradley jacketcha: are there 4chan archives aside from that? it crossed my mind recently
03:11 🔗 jacketcha nope
03:12 🔗 jacketcha I just looked through 3 pages of google, and about 11 different archives
03:12 🔗 jacketcha 303712123 can't be found
03:12 🔗 jacketcha (the post im looking for)
03:15 🔗 RETNUHWEJ yes, the archives cut out suddenly mid-june and then restart in 2015/10/24 and none of the archives (absolutely none of them) have anything in between from that time period
03:25 🔗 RETNUHWEJ OK, the date the missing archives start is from 2015/06/11 and then start again in 2015/10/24 (both dates yyyy/mm/dd)
03:26 🔗 RETNUHWEJ I am looking for some posts in between that and would appreciate very much any aid I can get
03:40 🔗 pizzaiolo has quit IRC (Remote host closed the connection)
03:58 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
03:58 🔗 jacketcha has joined #archiveteam-bs
03:59 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
03:59 🔗 jacketcha has joined #archiveteam-bs
04:33 🔗 MrDignity has quit IRC (Ping timeout: 248 seconds)
04:33 🔗 Rai-chan has quit IRC (Ping timeout: 248 seconds)
04:33 🔗 medowar has quit IRC (Ping timeout: 248 seconds)
04:34 🔗 ZexaronS has quit IRC (Ping timeout: 248 seconds)
04:34 🔗 HCross2 has quit IRC (Ping timeout: 248 seconds)
04:34 🔗 purplebot has quit IRC (Ping timeout: 248 seconds)
04:35 🔗 ZexaronS has joined #archiveteam-bs
04:36 🔗 MrDignity has joined #archiveteam-bs
04:36 🔗 i0npulse has quit IRC (Ping timeout: 248 seconds)
04:40 🔗 i0npulse has joined #archiveteam-bs
04:46 🔗 qw3rty115 has joined #archiveteam-bs
04:46 🔗 medowar has joined #archiveteam-bs
04:46 🔗 purplebot has joined #archiveteam-bs
04:47 🔗 HCross2 has joined #archiveteam-bs
04:51 🔗 Rai-chan has joined #archiveteam-bs
04:52 🔗 qw3rty114 has quit IRC (Read error: Operation timed out)
04:53 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
04:53 🔗 jacketcha has joined #archiveteam-bs
05:00 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
05:00 🔗 jacketcha has joined #archiveteam-bs
05:10 🔗 jacketcha Does anybody use WARCreate?
05:12 🔗 jacketcha It looks like it could be really useful, but it seems like it hasn't been updated in a while
05:14 🔗 jacketcha And it breaks roughly 60% of the time
06:49 🔗 hook54321 jacketcha wbradley: bibanon might know more about 4chan archives
08:44 🔗 jacketcha has quit IRC (Leaving)
08:44 🔗 jacketcha has joined #archiveteam-bs
09:23 🔗 jschwart has joined #archiveteam-bs
09:50 🔗 jacketcha Alright, so I have a proposal for a secondary ArchiveBot which should be way easier to set up. Here's the basic idea: The pipelines are replaced by users with a chrome extension, and instead of WARCS it uses liveweb. For those of you that don't know how liveweb works, what it does is make a HTTP request to a URL, and then replaces the URLs inside of the response with Wayback URLs. This is how it will work: A
09:50 🔗 jacketcha user requests a website to be archived through the IRC. Then, a control node looks for chrome extension installations, which register themselves when they are installed, and the control node chooses the one with the least load. The chrome extension takes the URL and crawls it for any URLs it can find, rather they be files or pages, but they don't archive them. After that, when there is a completed list of
09:50 🔗 jacketcha URLs, the extension fires off HTTP requests to the liveweb system, or to the URL https://web.archive.org/save/[URL you want to archive]. That's all. There are two main advantages in this system over the currently existing one: accessibility and the lack of a need for client-side storage. Due to the way liveweb works, all the archiving happens over at the Internet Archive, not on the pipeline, so the largest
09:50 🔗 jacketcha thing the pipeline will need to store is a list of URLs. Also, for the accessibility part, since this will be in chrome extension format, it should be cross platform, and extremely easy to install. On top of that, if required, since the chrome extension will be programmed in JavaScript, Android phones will also be compatible with the application. Any feedback?
10:01 🔗 BlueMaxim has quit IRC (Leaving)
10:58 🔗 pizzaiolo has joined #archiveteam-bs
11:44 🔗 drumstick has quit IRC (Read error: Operation timed out)
12:11 🔗 schbirid2 how do you make 100% sure that the user's personal data does not end up in a grab?
12:16 🔗 Somebody2 schbirid2: liveweb takes care of that already.
12:18 🔗 schbirid2 nice
12:19 🔗 Somebody2 try it on a page you are logged into
12:20 🔗 Somebody2 and note that it will grab the logged out version
12:21 🔗 Somebody2 !a http://blog.whyanimalsdothething.com
12:21 🔗 Somebody2 whoops, wrong channel
12:23 🔗 Somebody2 jacketcha: I like your idea -- please implement it!
12:26 🔗 jacketcha Alright, didn't hear a no
12:27 🔗 jacketcha JSBot is on the way
12:32 🔗 Somebody2 jacketcha: yay!
12:33 🔗 RETNUHWEJ has quit IRC (Ping timeout: 263 seconds)
12:35 🔗 jacketcha I think I'm going to set up the control node in nodejs and then give it a better API than just IRC
12:41 🔗 jacketcha Originally, I was going to add WARCs to it, but then I tried using WARCreate and realized that the relationship between JavaScript and WARCs is one way
13:01 🔗 JAA Sounds like a decent idea. The main downside I can think of is that the archives will not be downloadable (liveweb WARCs are private). liveweb is also quite inefficient in my experience compared to a browser + warcprox setup. But the cross-platform and distributed aspects sound nice.
13:05 🔗 jacketcha You know, what if ArchiveTeam hosted a liveweb copy? I am going to guess crawling is half the load
13:06 🔗 jacketcha Because I do understand that liveweb can be glitchy
13:06 🔗 jacketcha or at least the hosted one
13:06 🔗 jacketcha especially under high loads
13:07 🔗 jacketcha take for example all the times my twitter has been captured https://web.archive.org/web/*/https://twitter.com/_jacketchan_
13:07 🔗 jacketcha there is an obvious variation in quality between captures
13:07 🔗 jacketcha some are perfectly fine
13:07 🔗 jacketcha some are brokenish
13:07 🔗 jacketcha and some are literally just white screens
13:11 🔗 kimmer1 has joined #archiveteam-bs
13:13 🔗 pikhq has quit IRC (Ping timeout: 245 seconds)
13:15 🔗 jacketcha wait
13:16 🔗 jacketcha JAA: Couldn't you just WARC the wayback copy and switch the links out?
13:17 🔗 Mateon1 has quit IRC (Read error: Operation timed out)
13:17 🔗 Mateon1 has joined #archiveteam-bs
13:26 🔗 JAA jacketcha: You can get close, but you won't be able to reconstruct the exact original data sent by the server.
13:26 🔗 JAA ... which is what WARC's all about.
13:26 🔗 JAA So it's kind of pointless to try that.
13:26 🔗 jacketcha hmm
13:27 🔗 jacketcha can you upload to the archive via post?
13:30 🔗 JAA ?
13:30 🔗 jacketcha is there an api besides archive-it
13:30 🔗 JAA You can upload WARCs to IA, and they get included in the WM (after an IA admin verifies them).
13:31 🔗 JAA That's what ArchiveBot does.
13:31 🔗 jacketcha Oh wait
13:31 🔗 jacketcha ArchiveBot puts the warcs into the fortress of solitude, and then the FOS puts it in the IA, right?
13:32 🔗 JAA Yeah
13:32 🔗 jacketcha and I am going to guess that there is a standing API of sorts for the FOS
13:32 🔗 jacketcha great
13:32 🔗 JAA Uploads to FOS are just rsync.
13:32 🔗 JAA A few pipelines upload their data directly to IA.
13:32 🔗 jacketcha wait, so the IA does have an API for uploads?
13:33 🔗 JAA Of course it does.
13:33 🔗 jacketcha thank god
13:33 🔗 JAA Look at the internetarchive Python package.
13:33 🔗 jacketcha oh yeah
13:33 🔗 jacketcha I keep forgetting github and open source projects are a thing
13:33 🔗 JAA It has a CLI tool "ia" and can be used from within Python.
13:36 🔗 JAA And the S3-like interface can obviously also be implemented in anything else.
13:36 🔗 JAA https://archive.org/help/abouts3.txt
13:36 🔗 jacketcha great
13:36 🔗 jacketcha all I need is the end points
13:38 🔗 jacketcha oh wow it's 3:38 in the morning
13:38 🔗 jacketcha Should probably get to sleep before four
13:40 🔗 jacketcha I'll check this in the morning
13:41 🔗 jacketcha gn/gm\
13:41 🔗 jacketcha *gn/gm
13:59 🔗 jschwart has quit IRC (Konversation terminated!)
14:07 🔗 jschwart has joined #archiveteam-bs
14:37 🔗 jacketcha has quit IRC (Read error: Connection reset by peer)
14:37 🔗 jacketcha has joined #archiveteam-bs
14:41 🔗 kimmer12 has joined #archiveteam-bs
14:45 🔗 kimmer13 has joined #archiveteam-bs
14:48 🔗 kimmer1 has quit IRC (Read error: Operation timed out)
14:51 🔗 kimmer1 has joined #archiveteam-bs
14:52 🔗 kimmer12 has quit IRC (Ping timeout: 633 seconds)
14:55 🔗 kimmer12 has joined #archiveteam-bs
14:55 🔗 kimmer13 has quit IRC (Ping timeout: 633 seconds)
15:02 🔗 kimmer1 has quit IRC (Ping timeout: 633 seconds)
15:24 🔗 Ceryn has quit IRC (Read error: Operation timed out)
15:25 🔗 Ceryn has joined #archiveteam-bs
15:32 🔗 Gfy has quit IRC (Quit: I'll be back!)
15:33 🔗 Gfy has joined #archiveteam-bs
15:59 🔗 kimmer1 has joined #archiveteam-bs
16:07 🔗 kimmer12 has quit IRC (Ping timeout: 633 seconds)
16:08 🔗 kimmer13 has joined #archiveteam-bs
16:08 🔗 kimmer1 has quit IRC (Read error: Operation timed out)
16:12 🔗 kimmer1 has joined #archiveteam-bs
16:15 🔗 kimmer13 has quit IRC (Read error: Operation timed out)
16:20 🔗 LastNinja has quit IRC (Read error: Connection reset by peer)
16:36 🔗 dashcloud has quit IRC (Read error: Connection reset by peer)
16:37 🔗 dashcloud has joined #archiveteam-bs
17:53 🔗 pikhq has joined #archiveteam-bs
18:11 🔗 godane i'm up to 2017-10-31 for kpfa stuff
18:15 🔗 icedice has joined #archiveteam-bs
18:21 🔗 kimmer1 has quit IRC (Read error: Connection reset by peer)
18:21 🔗 kimmer12 has joined #archiveteam-bs
18:28 🔗 svchost03 has quit IRC (Ping timeout: 360 seconds)
19:07 🔗 C4K3_ has joined #archiveteam-bs
19:09 🔗 C4K3 has quit IRC (Read error: Operation timed out)
19:13 🔗 icedice has quit IRC (Read error: Connection reset by peer)
19:27 🔗 godane so my new tapes i bought came
19:27 🔗 godane alot of pbs and tlc stuff on these tapes
19:57 🔗 C4K3_ is now known as C4K3
20:32 🔗 svchost03 has joined #archiveteam-bs
20:32 🔗 svchfoo1 sets mode: +o svchost03
20:55 🔗 JAA jrwr: svchost02 seems to be broken, doesn't respond to invites.
21:07 🔗 SketchCow Could someone please tell me the deal with vidme
21:07 🔗 SketchCow 21G vidme4
21:07 🔗 SketchCow 355G vidme5
21:07 🔗 SketchCow I have these two things clogging up FOS, I'd like to know if I add them or not.
21:16 🔗 Kaz arkiver ^ I know vidme5 is definitely *good* data, not sure about vidme4
21:54 🔗 MrDignity has quit IRC (Remote host closed the connection)
21:54 🔗 MrDignity has joined #archiveteam-bs
21:59 🔗 SketchCow Well, I'd like to know, I'm trying to push all the data off of FOS so it's not riding at 50% capacity
22:00 🔗 SketchCow Also, I'm down to the last half-terabyte of Manga so that's good
22:00 🔗 SketchCow P.S. I am sick of fuckin' manga
22:30 🔗 dashcloud has quit IRC (Ping timeout: 250 seconds)
22:32 🔗 dashcloud has joined #archiveteam-bs
22:34 🔗 MrDignity has quit IRC (Ping timeout: 490 seconds)
22:48 🔗 jacketcha Did someone ask for more manga?
22:50 🔗 astrid it sounds like we're good for the moment
22:57 🔗 SketchCow We're more than good
23:00 🔗 SketchCow Especially after this last .5 terabyte
23:01 🔗 SketchCow I am watching the entire run of The Prisoner and I'm shocked at how few people know aboput The Prisoner
23:07 🔗 SketchCow Also, in other -bs news, one of my most popular blog posts ever, randomly shot up in the charts
23:07 🔗 SketchCow My blog has been getting 5 reads an hour average, because I've been elsewhere
23:07 🔗 SketchCow And someone on reddit linked to an entry
23:07 🔗 SketchCow 3,300 reads in one hour
23:07 🔗 SketchCow Reddit, man
23:11 🔗 jacketcha Wow
23:12 🔗 SketchCow Filesystem Size Used Avail Use% Mounted on
23:12 🔗 SketchCow dev/md1 13T 7.3T 5.3T 59% /2
23:12 🔗 SketchCow dev/md0 3.6T 2.4T 1.3T 66% /1
23:12 🔗 SketchCow One day
23:20 🔗 SketchCow godane: There are 11 shows of WoW Insider in the godaneinbox that lack any mp3s.
23:21 🔗 MrDignity has joined #archiveteam-bs
23:25 🔗 MrDignity has quit IRC (Remote host closed the connection)
23:25 🔗 MrDignity has joined #archiveteam-bs
23:25 🔗 BlueMaxim has joined #archiveteam-bs
23:29 🔗 drumstick has joined #archiveteam-bs
23:34 🔗 godane SketchCow: example url please?
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_2165
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_2175
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1925
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1815
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1805
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1795
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1785
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1775
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1765
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_1755
23:37 🔗 SketchCow Joystiq_WoW_Insider_Show_2095
23:39 🔗 godane that maybe cause of the brute force
23:40 🔗 godane so nothing is missing
23:40 🔗 godane i hope
23:40 🔗 godane i can't check right now since i'm digitizing a tape anyways
23:42 🔗 SketchCow That's fine
23:42 🔗 SketchCow Just wanted you aware
23:56 🔗 SketchCow astrid: I was told to keep yahooanswers texts for you
23:56 🔗 astrid hm, i'm not sure why but i guess?
23:56 🔗 astrid i'll happily take them i guess
23:56 🔗 astrid how big, what format, etc
23:57 🔗 SketchCow I don't know why I was supposed to
23:58 🔗 SketchCow They were a glorious pain in the ass to deal with
23:58 🔗 astrid well then in that case i absolutely must have them
23:59 🔗 SketchCow I'm finishing up the upload.
23:59 🔗 SketchCow it will be https://archive.org/details/archiveteam_yahooanswers_gathering
23:59 🔗 astrid yay
23:59 🔗 SketchCow I uploaded some directory by mistake, have to clean
23:59 🔗 astrid we all make mistakes

irclogger-viewer