#archiveteam-bs 2017-12-30,Sat

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
schbiridastrid: "Lengthy Archive Team and >>>archive discussions<<< here" [00:05]
***astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | <godane> SketchCow: your porn tapes are getting digitized right [00:06]
astridthere [00:07]
schbiridbleh
why
[00:07]
astridbecause ola_norsk wouldn't shut up [00:07]
schbiridwhy not make a #archiveteam-superimportantstuff and turn #archiveteam into the actual archiveteam discussion channel then [00:08]
astridi didn't make the decision, SketchCow did
take it up with him
[00:08]
schbiridand keep -bs as the random stuff channel it was
i dont know you anyways
[00:08]
astridwhat are you talking about i've been here for _years_ [00:09]
schbiridonly noticed you in action with moderation actions [00:09]
astridi also use the nick 'xmc'
i host the tracker for warrior projects
[00:09]
schbiridoh
sorry :D
that nick obviously rings many bells
[00:10]
astridheh [00:11]
schbiridschbirid hugs and goes to bed [00:12]
JAAastrid: The quote got cut off at the end ("... now"). :-( [00:13]
astridugh [00:13]
***astrid changes topic to: Lengthy Archive Team related discussions here | General archiving & offtopic: #archiveteam-ot | <godane> SketchCow: your porn tapes are getting digitized rn [00:13]
JAAYeah, EFNet's limits are ridiculous. [00:14]
astridalso schbirid, n.b. the topic in #archiveteam: "Lengthy discussions in #archiveteam-bs | Offtopic in #archiveteam-ot" [00:14]
***ola_norsk has joined #archiveteam-bs [00:16]
ola_norskuhm, the IA Arcade mame can run c64? or?
nvm, i misread an items desctiption
[00:20]
"dearest IA, please implement Emscpiten VICE some day" :) https://github.com/rjanicek/vice.js/ [00:30]
***ola_norsk has quit IRC (cya in '18)
schbirid has quit IRC (Ping timeout: 255 seconds)
schbirid has joined #archiveteam-bs
[00:31]
schbirid has quit IRC (Ping timeout: 255 seconds) [00:45]
schbirid has joined #archiveteam-bs [00:56]
..... (idle for 20mn)
jacketchaCan't ArchiveBot be used as a tweet quoting bot? [01:16]
........ (idle for 38mn)
DrasticAcI _finally_ managed to get my parsed Miiverse database onto Azure, so now I can start testing my site for real.
It's at https://archiverse.guide . It has basic auth on it right now while I stress test the database to make sure it can last under load (and I need to work on a FAQ and something from the home page)
If anyone wants to take a look and give me feedback, Username: archiverse Password: miiworse
[01:54]
***schbirid2 has joined #archiveteam-bs [02:10]
schbirid has quit IRC (Read error: Operation timed out) [02:15]
jacketchaIf Archiverse goes down, and we need to archive it, what do we call the next one? ArchivihcrA? [02:20]
DrasticAcNah, you see, I thought of that
I'm open sourcing the site and the database
The database is being uploaded right now to IA. 25 gigs compressed.
[02:25]
***kristian_ has quit IRC (Quit: Leaving) [02:26]
jacketchanow it's an even biggest question
If the IA goes down, is its archive called the Internet Archive Archive?
Or just the Internet Archive because the Internet Archive is a part of the internet?
[02:27]
joepie91jacketcha: https://www.archiveteam.org/index.php?title=INTERNETARCHIVE.BAK [02:37]
***RETNUHWEJ has joined #archiveteam-bs [02:42]
jacketchaINTERNETARCHIVE.BAK Archive.BAK
oh, that actually looks interesting
i have a couple of drives lying around I could contribute
[02:43]
***balrog has joined #archiveteam-bs
swebb sets mode: +o balrog
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
[02:52]
jacketchahey
someone brought something to my attention
there is basically zero archives of any 4chan post on /v/ around july 2015
does anybody know anything about that?
[03:04]
wbradleyjacketcha: are there 4chan archives aside from that? it crossed my mind recently [03:11]
jacketchanope
I just looked through 3 pages of google, and about 11 different archives
303712123 can't be found
(the post im looking for)
[03:11]
RETNUHWEJyes, the archives cut out suddenly mid-june and then restart in 2015/10/24 and none of the archives (absolutely none of them) have anything in between from that time period [03:15]
OK, the date the missing archives start is from 2015/06/11 and then start again in 2015/10/24 (both dates yyyy/mm/dd)
I am looking for some posts in between that and would appreciate very much any aid I can get
[03:25]
***pizzaiolo has quit IRC (Remote host closed the connection) [03:40]
.... (idle for 18mn)
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
[03:58]
....... (idle for 34mn)
MrDignity has quit IRC (Ping timeout: 248 seconds)
Rai-chan has quit IRC (Ping timeout: 248 seconds)
medowar has quit IRC (Ping timeout: 248 seconds)
ZexaronS has quit IRC (Ping timeout: 248 seconds)
HCross2 has quit IRC (Ping timeout: 248 seconds)
purplebot has quit IRC (Ping timeout: 248 seconds)
ZexaronS has joined #archiveteam-bs
MrDignity has joined #archiveteam-bs
i0npulse has quit IRC (Ping timeout: 248 seconds)
i0npulse has joined #archiveteam-bs
[04:33]
qw3rty115 has joined #archiveteam-bs
medowar has joined #archiveteam-bs
purplebot has joined #archiveteam-bs
HCross2 has joined #archiveteam-bs
Rai-chan has joined #archiveteam-bs
qw3rty114 has quit IRC (Read error: Operation timed out)
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
[04:46]
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
[05:00]
jacketchaDoes anybody use WARCreate?
It looks like it could be really useful, but it seems like it hasn't been updated in a while
And it breaks roughly 60% of the time
[05:10]
.................... (idle for 1h35mn)
hook54321jacketcha wbradley: bibanon might know more about 4chan archives [06:49]
........................ (idle for 1h55mn)
***jacketcha has quit IRC (Leaving)
jacketcha has joined #archiveteam-bs
[08:44]
........ (idle for 39mn)
jschwart has joined #archiveteam-bs [09:23]
...... (idle for 27mn)
jacketchaAlright, so I have a proposal for a secondary ArchiveBot which should be way easier to set up. Here's the basic idea: The pipelines are replaced by users with a chrome extension, and instead of WARCS it uses liveweb. For those of you that don't know how liveweb works, what it does is make a HTTP request to a URL, and then replaces the URLs inside of the response with Wayback URLs. This is how it will work: A
user requests a website to be archived through the IRC. Then, a control node looks for chrome extension installations, which register themselves when they are installed, and the control node chooses the one with the least load. The chrome extension takes the URL and crawls it for any URLs it can find, rather they be files or pages, but they don't archive them. After that, when there is a completed list of
URLs, the extension fires off HTTP requests to the liveweb system, or to the URL https://web.archive.org/save/[URL you want to archive]. That's all. There are two main advantages in this system over the currently existing one: accessibility and the lack of a need for client-side storage. Due to the way liveweb works, all the archiving happens over at the Internet Archive, not on the pipeline, so the largest
thing the pipeline will need to store is a list of URLs. Also, for the accessibility part, since this will be in chrome extension format, it should be cross platform, and extremely easy to install. On top of that, if required, since the chrome extension will be programmed in JavaScript, Android phones will also be compatible with the application. Any feedback?
[09:50]
***BlueMaxim has quit IRC (Leaving) [10:01]
............ (idle for 57mn)
pizzaiolo has joined #archiveteam-bs [10:58]
.......... (idle for 46mn)
drumstick has quit IRC (Read error: Operation timed out) [11:44]
...... (idle for 27mn)
schbirid2how do you make 100% sure that the user's personal data does not end up in a grab? [12:11]
Somebody2schbirid2: liveweb takes care of that already. [12:16]
schbirid2nice [12:18]
Somebody2try it on a page you are logged into
and note that it will grab the logged out version
!a http://blog.whyanimalsdothething.com
whoops, wrong channel
jacketcha: I like your idea -- please implement it!
[12:19]
jacketchaAlright, didn't hear a no
JSBot is on the way
[12:26]
Somebody2jacketcha: yay! [12:32]
***RETNUHWEJ has quit IRC (Ping timeout: 263 seconds) [12:33]
jacketchaI think I'm going to set up the control node in nodejs and then give it a better API than just IRC [12:35]
Originally, I was going to add WARCs to it, but then I tried using WARCreate and realized that the relationship between JavaScript and WARCs is one way [12:41]
..... (idle for 20mn)
JAASounds like a decent idea. The main downside I can think of is that the archives will not be downloadable (liveweb WARCs are private). liveweb is also quite inefficient in my experience compared to a browser + warcprox setup. But the cross-platform and distributed aspects sound nice. [13:01]
jacketchaYou know, what if ArchiveTeam hosted a liveweb copy? I am going to guess crawling is half the load
Because I do understand that liveweb can be glitchy
or at least the hosted one
especially under high loads
take for example all the times my twitter has been captured https://web.archive.org/web/*/https://twitter.com/_jacketchan_
there is an obvious variation in quality between captures
some are perfectly fine
some are brokenish
and some are literally just white screens
[13:05]
***kimmer1 has joined #archiveteam-bs
pikhq has quit IRC (Ping timeout: 245 seconds)
[13:11]
jacketchawait
JAA: Couldn't you just WARC the wayback copy and switch the links out?
[13:15]
***Mateon1 has quit IRC (Read error: Operation timed out)
Mateon1 has joined #archiveteam-bs
[13:17]
JAAjacketcha: You can get close, but you won't be able to reconstruct the exact original data sent by the server.
... which is what WARC's all about.
So it's kind of pointless to try that.
[13:26]
jacketchahmm
can you upload to the archive via post?
[13:26]
JAA? [13:30]
jacketchais there an api besides archive-it [13:30]
JAAYou can upload WARCs to IA, and they get included in the WM (after an IA admin verifies them).
That's what ArchiveBot does.
[13:30]
jacketchaOh wait
ArchiveBot puts the warcs into the fortress of solitude, and then the FOS puts it in the IA, right?
[13:31]
JAAYeah [13:32]
jacketchaand I am going to guess that there is a standing API of sorts for the FOS
great
[13:32]
JAAUploads to FOS are just rsync.
A few pipelines upload their data directly to IA.
[13:32]
jacketchawait, so the IA does have an API for uploads? [13:32]
JAAOf course it does. [13:33]
jacketchathank god [13:33]
JAALook at the internetarchive Python package. [13:33]
jacketchaoh yeah
I keep forgetting github and open source projects are a thing
[13:33]
JAAIt has a CLI tool "ia" and can be used from within Python.
And the S3-like interface can obviously also be implemented in anything else.
https://archive.org/help/abouts3.txt
[13:33]
jacketchagreat
all I need is the end points
oh wow it's 3:38 in the morning
Should probably get to sleep before four
I'll check this in the morning
gn/gm\
*gn/gm
[13:36]
.... (idle for 18mn)
***jschwart has quit IRC (Konversation terminated!) [13:59]
jschwart has joined #archiveteam-bs [14:07]
....... (idle for 30mn)
jacketcha has quit IRC (Read error: Connection reset by peer)
jacketcha has joined #archiveteam-bs
kimmer12 has joined #archiveteam-bs
kimmer13 has joined #archiveteam-bs
kimmer1 has quit IRC (Read error: Operation timed out)
kimmer1 has joined #archiveteam-bs
kimmer12 has quit IRC (Ping timeout: 633 seconds)
kimmer12 has joined #archiveteam-bs
kimmer13 has quit IRC (Ping timeout: 633 seconds)
[14:37]
kimmer1 has quit IRC (Ping timeout: 633 seconds) [15:02]
..... (idle for 22mn)
Ceryn has quit IRC (Read error: Operation timed out)
Ceryn has joined #archiveteam-bs
[15:24]
Gfy has quit IRC (Quit: I'll be back!)
Gfy has joined #archiveteam-bs
[15:32]
...... (idle for 26mn)
kimmer1 has joined #archiveteam-bs [15:59]
kimmer12 has quit IRC (Ping timeout: 633 seconds)
kimmer13 has joined #archiveteam-bs
kimmer1 has quit IRC (Read error: Operation timed out)
kimmer1 has joined #archiveteam-bs
kimmer13 has quit IRC (Read error: Operation timed out)
[16:07]
LastNinja has quit IRC (Read error: Connection reset by peer) [16:20]
.... (idle for 16mn)
dashcloud has quit IRC (Read error: Connection reset by peer)
dashcloud has joined #archiveteam-bs
[16:36]
................ (idle for 1h16mn)
pikhq has joined #archiveteam-bs [17:53]
.... (idle for 18mn)
godanei'm up to 2017-10-31 for kpfa stuff [18:11]
***icedice has joined #archiveteam-bs [18:15]
kimmer1 has quit IRC (Read error: Connection reset by peer)
kimmer12 has joined #archiveteam-bs
[18:21]
svchost03 has quit IRC (Ping timeout: 360 seconds) [18:28]
........ (idle for 39mn)
C4K3_ has joined #archiveteam-bs
C4K3 has quit IRC (Read error: Operation timed out)
icedice has quit IRC (Read error: Connection reset by peer)
[19:07]
godaneso my new tapes i bought came
alot of pbs and tlc stuff on these tapes
[19:27]
....... (idle for 30mn)
***C4K3_ is now known as C4K3 [19:57]
........ (idle for 35mn)
svchost03 has joined #archiveteam-bs
svchfoo1 sets mode: +o svchost03
[20:32]
..... (idle for 23mn)
JAAjrwr: svchost02 seems to be broken, doesn't respond to invites. [20:55]
SketchCowCould someone please tell me the deal with vidme
21G vidme4
355G vidme5
I have these two things clogging up FOS, I'd like to know if I add them or not.
[21:07]
Kazarkiver ^ I know vidme5 is definitely *good* data, not sure about vidme4 [21:16]
........ (idle for 38mn)
***MrDignity has quit IRC (Remote host closed the connection)
MrDignity has joined #archiveteam-bs
[21:54]
SketchCowWell, I'd like to know, I'm trying to push all the data off of FOS so it's not riding at 50% capacity
Also, I'm down to the last half-terabyte of Manga so that's good
P.S. I am sick of fuckin' manga
[21:59]
....... (idle for 30mn)
***dashcloud has quit IRC (Ping timeout: 250 seconds)
dashcloud has joined #archiveteam-bs
MrDignity has quit IRC (Ping timeout: 490 seconds)
[22:30]
jacketchaDid someone ask for more manga? [22:48]
astridit sounds like we're good for the moment [22:50]
SketchCowWe're more than good
Especially after this last .5 terabyte
I am watching the entire run of The Prisoner and I'm shocked at how few people know aboput The Prisoner
[22:57]
Also, in other -bs news, one of my most popular blog posts ever, randomly shot up in the charts
My blog has been getting 5 reads an hour average, because I've been elsewhere
And someone on reddit linked to an entry
3,300 reads in one hour
Reddit, man
[23:07]
jacketchaWow [23:11]
SketchCowFilesystem Size Used Avail Use% Mounted on
dev/md1 13T 7.3T 5.3T 59% /2
dev/md0 3.6T 2.4T 1.3T 66% /1
One day
[23:12]
godane: There are 11 shows of WoW Insider in the godaneinbox that lack any mp3s. [23:20]
***MrDignity has joined #archiveteam-bs
MrDignity has quit IRC (Remote host closed the connection)
MrDignity has joined #archiveteam-bs
BlueMaxim has joined #archiveteam-bs
drumstick has joined #archiveteam-bs
[23:21]
godaneSketchCow: example url please? [23:34]
SketchCowJoystiq_WoW_Insider_Show_2165
Joystiq_WoW_Insider_Show_2175
Joystiq_WoW_Insider_Show_1925
Joystiq_WoW_Insider_Show_1815
Joystiq_WoW_Insider_Show_1805
Joystiq_WoW_Insider_Show_1795
Joystiq_WoW_Insider_Show_1785
Joystiq_WoW_Insider_Show_1775
Joystiq_WoW_Insider_Show_1765
Joystiq_WoW_Insider_Show_1755
Joystiq_WoW_Insider_Show_2095
[23:37]
godanethat maybe cause of the brute force
so nothing is missing
i hope
i can't check right now since i'm digitizing a tape anyways
[23:39]
SketchCowThat's fine
Just wanted you aware
[23:42]
astrid: I was told to keep yahooanswers texts for you [23:56]
astridhm, i'm not sure why but i guess?
i'll happily take them i guess
how big, what format, etc
[23:56]
SketchCowI don't know why I was supposed to
They were a glorious pain in the ass to deal with
[23:57]
astridwell then in that case i absolutely must have them [23:58]
SketchCowI'm finishing up the upload.
it will be https://archive.org/details/archiveteam_yahooanswers_gathering
[23:59]
astridyay [23:59]
SketchCowI uploaded some directory by mistake, have to clean [23:59]
astridwe all make mistakes [23:59]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)