#archiveteam-bs 2017-10-07,Sat

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)

WhoWhatWhen
godanei'm uploading The Place For No Story as 'The Place For No Story 1973 Timecode'
cause there is timecode burn into the video
[00:07]
AsparagirJAA; Of course we're interested, why would you even ask. :-) [00:14]
***Asparagir has quit IRC (Asparagir) [00:16]
.... (idle for 16mn)
vitzli has joined #archiveteam-bs [00:32]
....... (idle for 34mn)
Baljem has quit IRC (Read error: Operation timed out) [01:06]
godanei'm capturing the 36th tape from Laughing Squid
SketchCow: at this rate i may have all tapes digitize in about week
[01:06]
***Odd0002_ has joined #archiveteam-bs
Odd0002 has quit IRC (Ping timeout: 600 seconds)
Odd0002_ is now known as Odd0002
[01:21]
dashcloud@Stiletto Amazingly, I was able to find the article again: http://www.vintagecomputing.com/index.php/archives/1063/bringing-prodigy-back-from-the-dead [01:26]
Stilettothanks so much :D [01:30]
dashcloudI can't wait to see all of the cool stuff you've found [01:31]
***username1 has joined #archiveteam-bs
schbirid2 has quit IRC (Read error: Operation timed out)
[01:34]
.......... (idle for 45mn)
pizzaiolo has quit IRC (Quit: pizzaiolo) [02:22]
..................... (idle for 1h42mn)
vitzli has quit IRC (Quit: Leaving) [04:04]
.... (idle for 19mn)
ranmahttp://techreport.com/news/32659/pour-one-out-for-aol-instant-messenger [04:23]
***Sk1d has quit IRC (Ping timeout: 250 seconds) [04:24]
Sk1d has joined #archiveteam-bs [04:31]
..... (idle for 22mn)
superkuhYeah. I tried logging on to AIM just now for kicks. SSL error on login.
Can't seem to get online.
ICQ still works fine though. We'll always have ICQ.
[04:53]
ranmahopefully [05:03]
..... (idle for 22mn)
fiezino, godane : I have a torrent site for "home-recordings" and " odd stuff like travel tapes"
superkuh, yeah you need an up-to-date client
[05:25]
superkuhMakes sense. I'm using Pidgin 2.6.6 which is pretty old. [05:28]
fieI was afraid they were only going to allow official aim client but new pidgin works
gf just said they are shutting down now? wtf
damn you facebook
Why can't mozilla take it over or something
someone named Mental Elf messaged me...
nobody on my buddy list is ever signed on
[05:29]
godanefie: is societyglitch?
i have a account there
[05:47]
.......... (idle for 47mn)
***Stiletto has quit IRC () [06:34]
fiegodane, yes [06:47]
....... (idle for 31mn)
Just don't know where I would source home movies and odd stuff... probably not ebay. [07:18]
.... (idle for 16mn)
***Stilett0- has joined #archiveteam-bs [07:34]
.......... (idle for 49mn)
TheLovina has quit IRC (Ping timeout: 370 seconds) [08:23]
................ (idle for 1h19mn)
brayden has quit IRC (Ping timeout: 255 seconds)
brayden has joined #archiveteam-bs
swebb sets mode: +o brayden
[09:42]
................... (idle for 1h31mn)
JAAAlright, so about wordpress.com: they have a link shortener, wp.me. The shortcode can have various different formats for linking to specific pages of a blog (e.g. directly to a post or an image attached to a post etc.). The format of main interest in this context, however, is simply the blog ID encoded in base62 ([0-9a-zA-Z]).
This shortening is provided by Jetpack, a Wordpress plugin installed and activated by default on all wordpress.com blogs (including free ones).
It seems that the maximum blog ID is currently somewhere just below 9g000, i.e. on the order of 135M shortcodes need to be scanned (9 * 62^4 + 16 * 62^3).
That's also the order of magnitude of how many blogs there are.
We could do this through URLTeam and then figure out what to do with it later.
[11:14]
...... (idle for 27mn)
***dd0a13f37 has joined #archiveteam-bs [11:47]
dd0a13f37Where do I report security issues for archive.org? [11:47]
username1info@archive.org [11:59]
***dashcloud has quit IRC (Read error: Connection reset by peer) [11:59]
username1they can either forward it for you or give you direct contact [12:00]
dd0a13f37alright, thanks [12:00]
***dashcloud has joined #archiveteam-bs [12:00]
username1thank YOU [12:01]
dd0a13f37Not sure it's anything major, but better safe than sorry I guess [12:07]
godaneSketchCow: your getting a showtime airing of Road To Wellville cause that in the case of tapes
plus side is it got the most out of having 10000k setting being at 6.4gb
based on the preview of Outer Limits preview it aired on the week of 1996-04-05
it was a preview for the episode called "The Refuge" with actor M. Emmet Walsh
[12:09]
***icedice has joined #archiveteam-bs [12:21]
dd0a13f37JAA: It's very close, more than an order of magnitude. Converting the latest shortlink to decimal and using it together with https://wordpress.com/activity/ to get an estimate
for posts/blog gives a result close to https://en.blog.wordpress.com/2015/01/06/2014-in-review/
Or the other way around, estimate number of blogs from 2014 posts/blog stats and stats, convert to b64, note that it's close
b62*
[12:34]
JAAYeah, I ran a test with the two-character codes and almost all of them existed. [12:35]
dd0a13f37According to that, there should be (base62) 08 57 30 52 59 blogs (131913987)
Which is close to 9g000
[12:36]
JAAYup [12:37]
dd0a13f37Although it's not exact - if you manipulate the POST request from the stats page you can get a chart for the number of blogs which gives 125452778 (08 30 23 61 56) as total
Or maybe they subtract deleted blogs, in which case it makes perfect sense
[12:44]
4 billion posts, that's actually not a whole lot [12:50]
***K4k has joined #archiveteam-bs
BlueMaxim has quit IRC (Read error: Connection reset by peer)
Mateon1 has quit IRC (Ping timeout: 255 seconds)
Mateon1 has joined #archiveteam-bs
[12:56]
........... (idle for 51mn)
dd0a13f37I don't think any of the libgen collections on IA are complete unless the logs have been tampered with. Should I upload it again? [13:51]
............... (idle for 1h11mn)
***dashcloud has quit IRC (Read error: Operation timed out) [15:02]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 260 seconds)
icedice2 has quit IRC (Client Quit)
icedice has joined #archiveteam-bs
[15:15]
JAAJust to confirm: is gawker.com archived, and where can the archives be found? I saw several mentions of it in the logs, but I can't find it on IA. (Via: https://www.reddit.com/r/Archiveteam/comments/73xszd/has_gawker_been_fully_archived/ ) [15:28]
***dashcloud has joined #archiveteam-bs [15:31]
dd0a13f37nvm, i found the real collection, up to r_2092000 is archived [15:37]
***username1 is now known as schbirid [15:51]
schbiridanyone know how to strip all formatting from a $msg in irssi perl scripting? [15:52]
***Rai-chan has joined #archiveteam-bs [15:57]
RichardG_ has joined #archiveteam-bs
RichardG has quit IRC (Read error: Connection reset by peer)
[16:08]
dd0a13f37Sci-mag is archived up to 64099999, foreignfiction up to 1600000
Foreignfiction goes up to 1890000, sci-mag torrents are down so not sure exactly how far they go
[16:18]
schbiridare they surely fully archived? or just 50% stalled torrents? [16:20]
dd0a13f37The torrents are seeded, so I think they're archived. The ones that I checked were at least [16:23]
schbiridi meant to grab all of scimag but about iirc 25% of the ones i tried were not fully seeded :( [16:25]
dd0a13f37Ask on forums for reseed then [16:34]
***loadup has joined #archiveteam-bs [16:42]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 250 seconds)
kepler45 has quit IRC (Quit: Leaving)
[16:54]
..... (idle for 21mn)
pizzaiolo has joined #archiveteam-bs [17:20]
Asparagir has joined #archiveteam-bs
svchfoo1 sets mode: +o Asparagir
icedice2 has quit IRC (Quit: Leaving)
icedice has joined #archiveteam-bs
TC01 has quit IRC (Remote host closed the connection)
[17:27]
icedice2 has joined #archiveteam-bs [17:38]
icedice has quit IRC (Read error: Operation timed out) [17:44]
.... (idle for 16mn)
Asparagir has quit IRC (Asparagir) [18:00]
....... (idle for 31mn)
icedice2 has quit IRC (Ping timeout: 255 seconds) [18:31]
JAAFor the record, we're now grabbing wp.me in URLTeam. :-) [18:31]
***icedice has joined #archiveteam-bs [18:33]
dd0a13f37So will you archive the whole of WP? [18:46]
Somebody2dd0a13f37: just the URLs, not their contents (at this point, at least) [18:46]
dd0a13f37In URLteam, yes, but for the WP project
Are they having any problems?
[18:47]
JAANot that I know of. [18:47]
Somebody2Not that I know of, but it's good to have a backup [18:47]
JAABut I figured, why the hell not? [18:47]
Somebody2heh, jinx [18:47]
dd0a13f374 bil posts, 1/4 have images, 1bil images, 1m each, 1pb
Large endeavour
[18:48]
***dd0a13f37 has quit IRC (Ping timeout: 268 seconds) [18:53]
icedice2 has joined #archiveteam-bs
dd0a13f37 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 506 seconds)
[18:58]
icedice has joined #archiveteam-bs
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
dd0a has joined #archiveteam-bs
dd0a is now known as dd0a13f37
[19:14]
icedice2 has quit IRC (Ping timeout: 506 seconds) [19:21]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 245 seconds)
icedice has joined #archiveteam-bs
dd0a13f37 has quit IRC (Ping timeout: 268 seconds)
ajshell1 has quit IRC (Quit: Leaving)
icedice2 has quit IRC (Read error: Operation timed out)
atrocity has joined #archiveteam-bs
Atros has quit IRC (Ping timeout: 246 seconds)
[19:27]
icedice2 has joined #archiveteam-bs [19:44]
icedice has quit IRC (Read error: Operation timed out) [19:50]
dd0a13f37 has joined #archiveteam-bs [19:56]
dd0a13f37It sure is some improvement over proxy+webirc [19:57]
***ajshell1 has joined #archiveteam-bs [20:04]
atrocity has quit IRC (Read error: Connection reset by peer)
atrocity has joined #archiveteam-bs
[20:15]
.... (idle for 19mn)
ajshell1 has quit IRC (Quit: Leaving) [20:35]
godaneso i have this on tape from the box: https://en.wikipedia.org/wiki/Heat_and_Sunlight
digitize it now
[20:45]
***ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Client Quit)
[20:48]
ajshell1 has joined #archiveteam-bs
Stilett0- has quit IRC (Ping timeout: 260 seconds)
[20:56]
dashcloud has quit IRC (Remote host closed the connection)
dashcloud has joined #archiveteam-bs
[21:02]
TC01 has joined #archiveteam-bs
ajshell1 has quit IRC (Quit: Leaving)
[21:13]
.... (idle for 15mn)
ajshell1 has joined #archiveteam-bs [21:32]
ajshell1 has quit IRC (Quit: Leaving) [21:45]
kepler45 has joined #archiveteam-bs [21:51]
..... (idle for 22mn)
ajshell1 has joined #archiveteam-bs [22:13]
ajshell1 has quit IRC (Quit: Leaving) [22:19]
kepler45 has quit IRC (Quit: Leaving)
odemg has quit IRC (Read error: Operation timed out)
[22:25]
odemg has joined #archiveteam-bs [22:32]
dd0a13f37A stripped-down version of archivebot for !ao, now that would be something. You could make it run much much faster if you can disregard certain constraints [22:38]
JAAYou can't really ignore that much though. You still need to process images, stylesheets, scripts, etc.
An internet where everyone conforms to standards so we don't have to use parsers which are slowed down by all kinds of odd special cases, now that would be something.
[22:39]
dd0a13f37Not always. And you could use another parser, like myhtml
Myhtml is fast, but there are no python binding
[22:41]
Somebody2!ao jobs don't see to be much of a bottleneck [22:43]
JAAIndeed, we rarely have a queue of !ao jobs.
And that's with only one !ao-only pipeline...
[22:43]
dd0a13f37JAA: I don't think they parse inline scripts [22:45]
JAAdd0a13f37: wpull does not actually parse scripts, but it does process it and tries to extract links from it.
s/it/them/
Same with CSS, I believe.
Only HTML is parsed properly.
[22:45]
***zino has quit IRC (Read error: Connection reset by peer)
zino has joined #archiveteam-bs
[23:00]
ajshell1 has joined #archiveteam-bs
ajshell1 has quit IRC (Client Quit)
[23:08]
icedice has joined #archiveteam-bs
icedice2 has quit IRC (Ping timeout: 260 seconds)
[23:18]
ajshell1 has joined #archiveteam-bs [23:27]
icedice2 has joined #archiveteam-bs
icedice has quit IRC (Ping timeout: 260 seconds)
dashcloud has quit IRC (Read error: Operation timed out)
dashcloud has joined #archiveteam-bs
[23:41]
icedice has joined #archiveteam-bs [23:58]

↑back Search ←Prev date Next date→ Show only urls(Click on time to select a line by its url)