Time |
Nickname |
Message |
00:30
π
|
kyan |
I've downloaded all of the Insurgency Wiki that is on the porfusion website. Is there something I should do with the data? |
00:32
π
|
arkhive |
Might be widely know.. I just found out though.. Bebo's old Bebo pages are now under the .archive.bebo.com/ |
00:33
π
|
arkhive |
and http://archive.bebo.com/Profile.jsp?MemberId=4134490647 |
00:33
π
|
arkhive |
example |
00:33
π
|
arkhive |
http://www.bebo.com/#faq Your old photos and blog posts are safe. They will be available for download in a couple of months. Other things (skins, quizzes, wall posts, games etcΓ’ΒΒ¦) unfortunately will all be retired. |
00:34
π
|
arkhive |
WeΓ’ΒΒre just as sentimental as you are (ok, probably more), and have left all public profiles visible for now. Private profiles are also saved, but not visible at the moment. |
00:35
π
|
arkhive |
It might be time to put in the AT Warrior and start a tracker and save/grab what we can |
00:44
π
|
arkhive |
Lol. but i am not that experienced to do so, yet. If I can help any way along with my bandwidth I'd love to. I had a Bebo years ago. Heh. |
03:47
π
|
chfoo |
i made the wretch wiki page: http://archiveteam.org/index.php?title=Wretch |
04:15
π
|
chfoo |
i made the bebo wiki page: http://archiveteam.org/index.php?title=Bebo |
04:39
π
|
SketchCow |
Wheeee |
04:39
π
|
SketchCow |
Did zapd die? |
04:43
π
|
SketchCow |
Nope, still there. |
04:43
π
|
yipdw |
hasn't been zapd out of existence yet |
04:44
π
|
link343 |
So an old music Blog / Aggregator is shutting down |
04:44
π
|
link343 |
http://pitchfork.com/news/52578-music-blog-aggregator-elbows-shuts-down/ |
04:44
π
|
link343 |
End of November. Do you think it's worth archiving? |
04:50
π
|
bsmith093 |
omf_: at some point the rsync died and now i get this sending incremental file list it hung on this for **at least** 3 hours, after i tried to re run the rsync |
05:09
π
|
chfoo |
link343: since no one else has answered yet, i'll say yes. i'll keep an eye on it. |
05:10
π
|
link343 |
ok |
06:39
π
|
Nemo_bis |
SketchCow: could you move all wikimediacommons* items in wikimedia-other collection to wikimediacommons collection? |
06:39
π
|
Nemo_bis |
(the collection was somehow broken but seems to work now) |
06:42
π
|
Nemo_bis |
are these on archive.org? http://www.bl.uk/bibliographic/download.html |
07:25
π
|
SketchCow |
http://archive.org/details/BritishLibraryRdf earlier one |
07:37
π
|
SketchCow |
I'm going to make a new one. |
07:37
π
|
SketchCow |
yay |
07:38
π
|
Nemo_bis |
:) |
07:38
π
|
Nemo_bis |
and the new book deriver got rid of some of my redrows, sweet |
08:14
π
|
ersi |
God damn it, Yahoo! - what the fuck is your problem. |
08:14
π
|
ersi |
I might have some friends who can read Traditional Chinese (Used in for example Taiwan/Republic of China) - havn't talked to them in ages though. |
08:25
π
|
Nemo_bis |
ersi: should be easy to find some if it's a quick task |
08:29
π
|
ersi |
It's for wretch.cc. We'll need it for finding important structure and for content verification |
08:38
π
|
Nemo_bis |
hmm, not so quick, I have only 1 option then |
08:39
π
|
SketchCow |
guy: I have Amiga files not in TOSEC. Would like to add them. |
08:39
π
|
SketchCow |
me: cool! here's where to go to TOSEC to contribute |
08:39
π
|
SketchCow |
guy: Oh, that's requiring to make a login, I'm not gonna do that |
08:40
π
|
SketchCow |
So I think he wants me to put the spoon in AND move his chin so he chews it |
08:59
π
|
godane |
SketcchCow: Move these to a tekzilla-daily collection when you can: http://archive.org/search.php?query=collection%3Atekzilla%20AND%20subject%3A%22tekzilla%20daily%20tip%22&sort=-date |
08:59
π
|
godane |
this is more to keep the tekzilla collection just for full episodes of tekzilla |
09:02
π
|
godane |
since there is 1500+ tekzilla daily episodes |
09:09
π
|
SketchCow |
tekzilla-daily now created and you own it |
09:10
π
|
godane |
thanks |
09:18
π
|
SketchCow |
http://archive.org/details/BritishLibraryRdf-2013-09 |
09:22
π
|
Nemo_bis |
SketchCow: https://twitter.com/BLMetadata/status/387145272951701504 if you want to announce it to them |
09:23
π
|
godane |
looks like something is wrong with this item: http://archive.org/details/Tekzilla_Daily_36 |
09:30
π
|
SketchCow |
It's set dark. |
09:30
π
|
SketchCow |
I'll find out why tomorrow. |
09:36
π
|
godane |
it patrick talking about pricewatch.com |
09:36
π
|
godane |
so in less they think that one was spam i don't see why it would go dark |
09:40
π
|
godane |
also looks like episode 41 and 42 of oneoff epsiodes maybe in revision3 bestof collection |
09:41
π
|
godane |
there was 3 episodes of e3 2009 live streaming |
09:43
π
|
godane |
SketchCow: thought i point out that i have 33 more episodes of geekbeat.tv to be add the collection: http://archive.org/search.php?query=subject%3A%22GeekBeat.TV%22%20AND%20collection%3Aopensource_movies&sort=-date |
10:30
π
|
Schbirid |
grab it while you can http://www-users.cs.umn.edu/~sarwat/foursquaredata/ |
10:44
π
|
GLaDOS |
And anarchive has a copy. |
12:23
π
|
Keni |
hi |
12:24
π
|
Schbirid |
hello3 |
12:26
π
|
Schbirid |
and the dataset is gone :D |
12:26
π
|
Schbirid |
for the better |
12:27
π
|
Keni |
oh |
12:28
π
|
Keni |
I feel 24hour like 150~200hour |
12:34
π
|
joepie91 |
Schbirid: lol |
12:34
π
|
* |
joepie91 has a copy |
12:34
π
|
Schbirid |
it looked like quite the gross privacy violation so its for the better to be gone |
12:34
π
|
joepie91 |
Schbirid: privacy violation? this is data that users have put on foursquare themselves publicly, no? |
12:35
π
|
kyan |
Hi yall, I'm working on downloading elbo.ws by the way |
12:35
π
|
Schbirid |
there is a difference between putting data on fq and mass aggregating it |
12:35
π
|
joepie91 |
Schbirid: hardly |
12:35
π
|
ersi |
Take that discussion somewhere else |
12:35
π
|
ersi |
In this channel: Grabbing is GO and OK for whatever reason. |
12:35
π
|
Schbirid |
but lets not get into that discussion again, last time people showed to have a different understandnig of privacy than me |
12:35
π
|
Schbirid |
aye |
12:36
π
|
ersi |
Feel free to talk about privacy/downloading moral in #archiveteam-bs though |
12:39
π
|
Keni |
okay, sry both of you. Really sorry. |
12:39
π
|
ersi |
Keni: No worries, I'm saying it because everyone needs a reminder occationally. And we're many people, so if we drift off-topic in this channel, something important might get lost. |
12:40
π
|
norbert79 |
ersi: Use simple English... He is Japanese. |
12:40
π
|
norbert79 |
ersi: He is having hard time understanding... |
12:40
π
|
Keni |
thx but I'ts allright |
12:41
π
|
Keni |
This is so GAP than learn to school. |
12:42
π
|
Keni |
So don't mind that thx. |
12:42
π
|
norbert79 |
sure |
12:42
π
|
norbert79 |
:) |
15:02
π
|
SketchCow |
I want that dataset. |
15:13
π
|
SketchCow |
I wish I was awake sooner. |
15:14
π
|
SketchCow |
The SECOND datasets like that appear, grab. |
15:24
π
|
fz |
I am not sure if Archivebot is still working on Silk Road Forums, but the site is still squirming |
15:24
π
|
fz |
was just able to load it a few minutes ago. |
15:26
π
|
omf_ |
SketchCow, GLaDOS grabbed a copy of that foursquare data |
16:10
π
|
godane |
SketchCow: i failed you on that one |
16:11
π
|
godane |
but i found another smaller dataset |
16:39
π
|
joepie91 |
SketchCow: check your PM |
17:06
π
|
yipdw |
it works well enough at this point |
17:06
π
|
yipdw |
!status |
17:06
π
|
ATBot |
yipdw: Job status: 5039 completed, 14 aborted, 3 in progress, 0 pending |
17:06
π
|
yipdw |
yep |
19:27
π
|
Nemo_bis |
https://www.mediawiki.org/w/index.php?title=Language_portal&diff=prev&oldid=797446 |
19:40
π
|
Schbirid |
anyone got some good wget --reject-regexp for blogspot sites to reduce duplicates and search result shite? |
19:46
π
|
omf_ |
Schbirid, let us know if you find anything, that sounds really useful |
19:56
π
|
Nemo_bis |
Schbirid: and remember to update http://archiveteam.org/index.php?title=Blogger |
19:57
π
|
Schbirid |
nice, thanks |
19:58
π
|
omf_ |
Do you have url lists from a few sites? |
20:00
π
|
Schbirid |
nope |
20:04
π
|
Schbirid |
any idea what the "*\\?*,*@*" is supposed to reject? |
21:15
π
|
Nemo_bis |
He went away, but I guess any URL with parameters? |
21:29
π
|
Nemo_bis |
Funny http://oami.europa.eu/robots.txt |
22:02
π
|
diffalot |
CAT SIGNAL ACTIVATED: blip.tv is deleting years of vloggers videos, can you help? |
22:03
π
|
diffalot |
i'm checking that archive.org is willing to ingest it all |
22:05
π
|
omf_ |
diffalot, start by giving us a link to the announcement page |
22:06
π
|
diffalot |
no page, this is something blip is quietly doing, see tweets from https://twitter.com/schlomo , quirk, and trine |
22:07
π
|
diffalot |
here's a news story: http://www.zennie62blog.com/2013/10/08/blip-tv-er-blip-networks-sacks-ceo-kelly-day-shortest-exec-career-since-john-paul-i-24113/ |
22:08
π
|
diffalot |
archive.org says, "hell yes" https://twitter.com/tracey_pooh/status/387700340176351233 |
22:12
π
|
omf_ |
Well this is an interesting problem. How to find the vloggers they are going to erase |
22:15
π
|
omf_ |
and I already want to shit on the heads of the developers of blip.tv |
22:16
π
|
omf_ |
good we already have a page http://archiveteam.org/index.php?title=Blip.tv |
22:19
π
|
SketchCow |
It's 30 days. |
22:19
π
|
SketchCow |
We have 30 days. |
22:20
π
|
diffalot |
perhaps blip would provide a list? or we create an opt-in form? i'm not seeing any mediaRSS feeds on the user profile pages in question |
22:22
π
|
diffalot |
i'm ok with phantomJS and jsdom, so i'll see what i can do |
22:24
π
|
joepie91 |
I guess that the last ditch effort would be |
22:25
π
|
joepie91 |
"just archive all of blip and we'll figure out what's gone later" |
22:28
π
|
diffalot |
i'm looking for an example of a past scraper the team has used, any recommendations? |
22:29
π
|
diffalot |
iirc, y'all have some sophisticated turnkey solutions ;) |
22:33
π
|
omf_ |
While I am adding info I find to the wiki about blip.tv I would like to remind everyone we had serious server problems during backing up zapd and frankly blip.tv is going to require bigger metal to suck down that much data |
22:33
π
|
SketchCow |
what sort of server problems. |
22:34
π
|
omf_ |
we went down, we ran out of space, the usual |
22:34
π
|
SketchCow |
Well, that's because the same people aren't using the tracker - we'll have the use of FOS for a dump. |
22:34
π
|
omf_ |
the hosting company randomly turns off or reboots the server |
22:39
π
|
SketchCow |
That's because people take over the project and use their own central servers instead of internet archive. |
23:04
π
|
omf_ |
I am searching the commoncrawl index for urls |
23:07
π
|
diffalot |
ah ha: http://blip.tv/schlomo/rss/ |
23:07
π
|
diffalot |
(must be turned on by the producer?) |
23:15
π
|
diffalot |
WHAT FORSOOTH, PRITHEE TELL ME THE SECRET WORD |
23:17
π
|
omf_ |
THY SECRET WORD is "yahoosucks" GO FORTH AND IMPART THY KNOWLEDGE |
23:17
π
|
* |
diffalot kneels and accepts the mantle |
23:17
π
|
omf_ |
SketchCow, we need a cool name to make an irc channel |
23:19
π
|
bsmith093 |
bloop |
23:25
π
|
kyan |
Has anyone here looked into working with the Majestic-12 project to find usernames for websites and such? It seems like it could be a really valuable source of data (they have ~2.7 trillion URLs in their databases)Γ’ΒΒ¦ |
23:27
π
|
omf_ |
kyan, url? |
23:27
π
|
kyan |
omf_: this is their "real" website: http://www.majestic12.co.uk/ This is their commercial website: https://www.majesticseo.com/ |
23:39
π
|
diffalot |
contacting the public relations team at blip (http://annieisms.com/about/), good idea or bad idea? |
23:40
π
|
Cameron_D |
kyan: I do a lot of MJ12 crawling, I've considered asking them in the past, but due to the commercial nature of what they do I doubt they'd work with us. |
23:41
π
|
diffalot |
the question would be: can we get a list of the shows that are being deleted? |
23:41
π
|
kyan |
Cameron_D, that would be understandable |
23:41
π
|
omf_ |
and yet they use the public to do the bulk of the world |
23:43
π
|
kyan |
AT wouldn't be using the data for profitΓ’ΒΒ¦ might be worth asking |
23:43
π
|
Cameron_D |
yeah, maybe |